https://github.com/corymccartan/birdie
Bayesian Instrumental Regression for Disparity Estimation
https://github.com/corymccartan/birdie
Last synced: 3 months ago
JSON representation
Bayesian Instrumental Regression for Disparity Estimation
- Host: GitHub
- URL: https://github.com/corymccartan/birdie
- Owner: CoryMcCartan
- License: gpl-3.0
- Created: 2021-12-19T00:07:26.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-19T16:18:38.000Z (12 months ago)
- Last Synced: 2025-03-17T22:13:23.436Z (3 months ago)
- Language: R
- Homepage: http://corymccartan.com/birdie/
- Size: 25.4 MB
- Stars: 5
- Watchers: 4
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
set.seed(5118)
```# **BIRDiE**: Estimating disparities when race is not observed
[](https://github.com/CoryMcCartan/birdie/actions/workflows/R-CMD-check.yaml)
[](https://cran.r-project.org/package=redist)
Bayesian Instrumental Regression for Disparity Estimation (BIRDiE) is a class of
Bayesian models for accurately estimating conditional distributions by race,
using Bayesian Improved Surname Geocoding (BISG) probability estimates of
individual race.
This package implements BIRDiE as described in [McCartan, Fisher, Goldin, Ho, and Imai (2024)](https://www.nber.org/papers/w32373).
It also implements standard BISG and an improved measurement-error BISG model as described
in [Imai, Olivella, and Rosenman (2022)](https://www.science.org/doi/full/10.1126/sciadv.adc9824).
## Installation
You can install the latest version of the package from CRAN with:
``` r
install.packages("birdie")
```You can also install the development version with:
``` r
# install.packages("remotes")
remotes::install_github("CoryMcCartan/birdie")
```## Basic Usage
A basic analysis has two steps.
First, you compute BISG probability estimates with the `bisg()` or `bisg_me()` functions (or using any other probabilistic race prediction tool).
Then, you estimate the distribution of an outcome variable by race using the `birdie()` function.```{r}
library(birdie)data(pseudo_vf)
head(pseudo_vf)
```To compute BISG probabilities, you provide the last name and (optionally) geography variables as part of a formula.
```{r}
r_probs = bisg(~ nm(last_name) + zip(zip), data=pseudo_vf)head(r_probs)
```Computing regression estimates requires specifying a model structure.
Here, we'll use a Categorical-Dirichlet regression model that lets the
relationship between turnout and race vary by ZIP code.
This is the "no-pooling" model from McCartan et al.
We'll use Gibbs sampling for inference, which will also let us capture the uncertainty in our estimates.```{r}
fit = birdie(r_probs, turnout ~ proc_zip(zip), data=pseudo_vf,
family=cat_dir(), algorithm="gibbs")print(fit)
```The `proc_zip()` function fills in missing ZIP codes, among other things.
We can extract the estimated conditional distributions with `coef()`.
We can also get updated BISG probabilities that additionally condition on turnout using `fitted()`.
Additional functions allow us to extract a tidy version of our estimates (`tidy()`)
and visualize the estimated distributions (`plot()`).```{r}
coef(fit)head(fitted(fit))
tidy(fit)
plot(fit)
```A more detailed introduction to the method and software package can be found
on the [Get Started](https://corymccartan.com/birdie/articles/birdie.html) page.