Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rudeboybert/math241

Reed College Data Science
https://github.com/rudeboybert/math241

Last synced: 13 days ago
JSON representation

Reed College Data Science

Awesome Lists containing this project

README

        

---
title: "Case Studies in Statistical Analysis AKA Data Science"
output:
html_document:
keep_md: yes
---

This is the GitHub repository for Reed College's Spring 2015 MATH 241 Case Studies in Statistical Analysis AKA Data Science.

* All slide presentations from this class can be found at [RPubs](http://rpubs.com/rudeboybert/) and is tagged: MATH 241.
* The syllabus can be found at [here](https://docs.google.com/spreadsheets/d/1HQPtHvPLQl_meSeJK372oXmkY7BVD4rCOamMSwTfaBI/pubhtml?gid=0&single=true).
* The summary presentation titled "Teaching Data Science to Undergrads: an ex-Googler’s Tales from the Trenches" can be found at [RPubs](http://rpubs.com/rudeboybert/Teaching_Data_Science_Ugrads). The code to generate the slides can be found in the directory `Teaching_Data_Science_Ugrads`.

```{r, echo=FALSE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggthemes))
```

## Examples

### Flight data from Houston Airport (IAH)

Credit to Rennie Meyers

```{r, echo=FALSE, message=FALSE, fig.height=6}
flights <- read.csv("./Lec06 R Markdown + HW01/flights.csv", stringsAsFactors = FALSE) %>%
tbl_df() %>%
mutate(date=as.Date(date))
weather <- read.csv("./Lec06 R Markdown + HW01/weather.csv", stringsAsFactors = FALSE) %>%
tbl_df() %>%
mutate(date=as.Date(date))
planes <- read.csv("./Lec06 R Markdown + HW01/planes.csv", stringsAsFactors = FALSE) %>%
tbl_df()
airports <- read.csv("./Lec06 R Markdown + HW01/airports.csv", stringsAsFactors = FALSE) %>%
tbl_df()
states <- read.csv("./Lec06 R Markdown + HW01/states.csv", stringsAsFactors = FALSE) %>%
tbl_df()
flight_delays <- flights %>%
select(date, dep_delay) %>%
group_by(date)
flight_delays30 <- filter(flight_delays, dep_delay > 30) %>%
count(date)

ggplot(data=flight_delays30, aes(x=date, y=n)) +
geom_line(stat="identity") +
xlab("Date") +
ylab("Number of Flights Delayed longer than 30 minutes") +
ggtitle("Departure Delays from Houston over a Year") +
theme_economist() +
geom_smooth(col="blue")
```

### Sex breakdown for different jobs on San Francisco OkCupid

Credit to Miguel Connor

```{r, echo=FALSE, message=FALSE, fig.width=10, fig.height=6}
profiles <- read.csv("./Lec09 OkCupid Data/profiles.csv", header=TRUE) %>% tbl_df()
ggplot(profiles, aes(job, fill = sex)) +
geom_bar(position = "dodge") +
xlab("Job") +
ylab("Counts") +
theme(axis.text.x=element_text(angle=45, hjust=1))
```

### Sex breakdown for different jobs on San Francisco OkCupid