Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rudeboybert/math241
Reed College Data Science
https://github.com/rudeboybert/math241
Last synced: 13 days ago
JSON representation
Reed College Data Science
- Host: GitHub
- URL: https://github.com/rudeboybert/math241
- Owner: rudeboybert
- Created: 2015-01-23T18:50:10.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2016-03-23T14:55:51.000Z (over 8 years ago)
- Last Synced: 2024-10-09T22:06:28.921Z (about 1 month ago)
- Language: HTML
- Homepage:
- Size: 200 MB
- Stars: 6
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
title: "Case Studies in Statistical Analysis AKA Data Science"
output:
html_document:
keep_md: yes
---This is the GitHub repository for Reed College's Spring 2015 MATH 241 Case Studies in Statistical Analysis AKA Data Science.
* All slide presentations from this class can be found at [RPubs](http://rpubs.com/rudeboybert/) and is tagged: MATH 241.
* The syllabus can be found at [here](https://docs.google.com/spreadsheets/d/1HQPtHvPLQl_meSeJK372oXmkY7BVD4rCOamMSwTfaBI/pubhtml?gid=0&single=true).
* The summary presentation titled "Teaching Data Science to Undergrads: an ex-Googler’s Tales from the Trenches" can be found at [RPubs](http://rpubs.com/rudeboybert/Teaching_Data_Science_Ugrads). The code to generate the slides can be found in the directory `Teaching_Data_Science_Ugrads`.```{r, echo=FALSE, message=FALSE, warning=FALSE}
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggthemes))
```
## Examples
### Flight data from Houston Airport (IAH)
Credit to Rennie Meyers
```{r, echo=FALSE, message=FALSE, fig.height=6}
flights <- read.csv("./Lec06 R Markdown + HW01/flights.csv", stringsAsFactors = FALSE) %>%
tbl_df() %>%
mutate(date=as.Date(date))
weather <- read.csv("./Lec06 R Markdown + HW01/weather.csv", stringsAsFactors = FALSE) %>%
tbl_df() %>%
mutate(date=as.Date(date))
planes <- read.csv("./Lec06 R Markdown + HW01/planes.csv", stringsAsFactors = FALSE) %>%
tbl_df()
airports <- read.csv("./Lec06 R Markdown + HW01/airports.csv", stringsAsFactors = FALSE) %>%
tbl_df()
states <- read.csv("./Lec06 R Markdown + HW01/states.csv", stringsAsFactors = FALSE) %>%
tbl_df()
flight_delays <- flights %>%
select(date, dep_delay) %>%
group_by(date)
flight_delays30 <- filter(flight_delays, dep_delay > 30) %>%
count(date)ggplot(data=flight_delays30, aes(x=date, y=n)) +
geom_line(stat="identity") +
xlab("Date") +
ylab("Number of Flights Delayed longer than 30 minutes") +
ggtitle("Departure Delays from Houston over a Year") +
theme_economist() +
geom_smooth(col="blue")
```### Sex breakdown for different jobs on San Francisco OkCupid
Credit to Miguel Connor
```{r, echo=FALSE, message=FALSE, fig.width=10, fig.height=6}
profiles <- read.csv("./Lec09 OkCupid Data/profiles.csv", header=TRUE) %>% tbl_df()
ggplot(profiles, aes(job, fill = sex)) +
geom_bar(position = "dodge") +
xlab("Job") +
ylab("Counts") +
theme(axis.text.x=element_text(angle=45, hjust=1))
```### Sex breakdown for different jobs on San Francisco OkCupid