Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ellisvalentiner/covid-19-county-model
Bayesian model of COVID-19 cases in U.S. counties
https://github.com/ellisvalentiner/covid-19-county-model
bayesian-statistics covid-19 r stan
Last synced: about 2 months ago
JSON representation
Bayesian model of COVID-19 cases in U.S. counties
- Host: GitHub
- URL: https://github.com/ellisvalentiner/covid-19-county-model
- Owner: ellisvalentiner
- Created: 2020-11-15T16:08:47.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-16T02:08:10.000Z (about 4 years ago)
- Last Synced: 2023-08-17T01:22:29.652Z (over 1 year ago)
- Topics: bayesian-statistics, covid-19, r, stan
- Homepage:
- Size: 819 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
README
---
output: github_document
always_allow_html: true
editor_options:
chunk_output_type: console
---# COVID-19 U.S. County Model
Bayesian model of COVID-19 cases in U.S. counties.
***
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE,
error = FALSE,
include = FALSE,
collapse = TRUE,
comment ="#>",
fig.retina = 2
)
library(tidyverse)
library(rstan)
library(tidybayes)
library(sf)
library(leaflet)
library(scales)
library(glue)rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
```## Data
The data is from the [COVID-19 Event Risk Planner](https://github.com/appliedbinf/covid19-event-risk-planner), which combines data from several sources including the [NYTimes COVID19 data project](https://github.com/nytimes/covid-19-data) and [U.S. Census](https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html).
It includes U.S. county-level COVID-19 data such as number of cases, deaths, and population.```{r get-data}
covid_cases <-
read_csv(
file = "https://raw.githubusercontent.com/appliedbinf/covid19-event-risk-planner/master/usa_risk_counties.csv",
col_types = cols(
GEOID = col_integer(),
NAME = col_character(),
stname = col_character(),
cases = col_double(),
deaths = col_double(),
cases_past = col_double(),
X = col_double(),
pop = col_double(),
Nr = col_double(),
risk = col_double(),
asc_bias = col_double(),
event_size = col_double()
)
) %>%
arrange(GEOID) %>%
remove_missing(
na.rm = TRUE,
vars = c("GEOID", "cases", "pop")
) %>%
select(GEOID, NAME, stname, cases, pop) %>%
distinct()counties <-
sf::read_sf(
dsn = "https://raw.githubusercontent.com/appliedbinf/covid19-event-risk-planner/master/COVID19-Event-Risk-Planner/map_data/tl_2017_us_county.geojson"
)data <-
covid_cases %>%
left_join(counties)
```## Stan Model
I fit a hierarchical binomial model for the counts of COVID-19 cases in each U.S. county.
The model treats each county as population members and uses partial pooling to estimate county-level COVID-19 cases.
Partial pooling means the county-level COVID-19 probabilities are modeled by a distribution.
This allows for information sharing among these parameters.The Stan model is below:
```{r model}
model_code <- "
data {
int N; // counties
int y[N]; // cases
int K[N]; // populations
}
parameters {
real phi; // population chance of covid
real kappa; // population concentration
vector[N] theta; // chance of covid
}
model {
kappa ~ pareto(1, 1.5); // hyperprior
theta ~ beta(phi * kappa, (1 - phi) * kappa); // prior
y ~ binomial(K, theta); // likelihood
}
"
``````{stan, echo = TRUE, include=TRUE, output.var="model"}
data {
int N; // counties
int y[N]; // cases
int K[N]; // populations
}
parameters {
real phi; // population chance of covid
real kappa; // population concentration
vector[N] theta; // chance of covid
}
model {
kappa ~ pareto(1, 1.5); // hyperprior
theta ~ beta(phi * kappa, (1 - phi) * kappa); // prior
y ~ binomial(K, theta); // likelihood
}
``````{r fit, cache=TRUE}
fit <-
stan(
model_code = model_code,
data=list(
"N"=nrow(data),
"K"=data$pop,
"y"=data$cases
),
iter=10000,
chains=4
)
```## Results
```{r post-processing}
predictions <-
fit %>%
recover_types() %>%
gather_draws(theta[i]) %>%
group_by(i) %>%
summarize(
posterior_median = median(.value),
) %>%
ungroup()map_data <-
data %>%
bind_cols(predictions) %>%
mutate(
yhat = pop * posterior_median,
res = yhat - cases,
stdres = res/sd(res),
pct = (yhat - cases)/cases
) %>%
st_sf()
```### COVID-19 Rate
```{r map-estimates, include=TRUE, screenshot.opts=list(zoom = 2)}
leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
setView(lat = 37.1, lng = -95.7, zoom = 4) %>%
addPolygons(
data = map_data,
color = "#444444",
weight = 0.2,
smoothFactor = 0.1,
opacity = 1.0,
fillOpacity = 0.7,
fillColor = ~colorNumeric(
palette = "viridis",
domain = map_data$posterior_median
)(posterior_median),
highlight = highlightOptions(weight = 1),
label = glue("{map_data$NAME}, {map_data$stname}\n{percent(map_data$posterior_median, accuracy = 0.1)}")
) %>%
addLegend(
data = map_data,
position = "bottomright",
pal = colorNumeric(
palette = "viridis",
domain = map_data$posterior_median
),
values = ~posterior_median,
title = "Rate",
opacity = .7,
labFormat = function(type, cuts, p) {
percent(cuts)
}
)
```### Residuals
```{r map-residuals, include=TRUE, screenshot.opts=list(zoom = 2)}
leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
setView(lat = 37.1, lng = -95.7, zoom = 4) %>%
addPolygons(
data = map_data,
color = "#444444",
weight = 0.2,
smoothFactor = 0.1,
opacity = 1.0,
fillOpacity = 0.7,
fillColor = ~colorNumeric(
palette = "viridis",
domain = map_data$res
)(res),
highlight = highlightOptions(weight = 1),
label = glue("{map_data$NAME}, {map_data$stname}\n{percent(map_data$stdres, accuracy = 0.1)}")
) %>%
addLegend(
data = map_data,
position = "bottomright",
pal = colorNumeric(
palette = "viridis",
domain = map_data$res
),
values = ~res,
title = "Residuals",
opacity = .7
)
```