An open API service indexing awesome lists of open source software.

https://github.com/corymccartan/birdie

Bayesian Instrumental Regression for Disparity Estimation
https://github.com/corymccartan/birdie

Last synced: 3 months ago
JSON representation

Bayesian Instrumental Regression for Disparity Estimation

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
set.seed(5118)
```

# **BIRDiE**: Estimating disparities when race is not observed

[![R-CMD-check](https://github.com/CoryMcCartan/birdie/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/CoryMcCartan/birdie/actions/workflows/R-CMD-check.yaml)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-last-release/birdie)](https://cran.r-project.org/package=redist)
![CRAN downloads](http://cranlogs.r-pkg.org/badges/grand-total/birdie)

Bayesian Instrumental Regression for Disparity Estimation (BIRDiE) is a class of
Bayesian models for accurately estimating conditional distributions by race,
using Bayesian Improved Surname Geocoding (BISG) probability estimates of
individual race.
This package implements BIRDiE as described in [McCartan, Fisher, Goldin, Ho, and Imai (2024)](https://www.nber.org/papers/w32373).
It also implements standard BISG and an improved measurement-error BISG model as described
in [Imai, Olivella, and Rosenman (2022)](https://www.science.org/doi/full/10.1126/sciadv.adc9824).

BIRDiE Overview Poster

## Installation

You can install the latest version of the package from CRAN with:

``` r
install.packages("birdie")
```

You can also install the development version with:

``` r
# install.packages("remotes")
remotes::install_github("CoryMcCartan/birdie")
```

## Basic Usage

A basic analysis has two steps.
First, you compute BISG probability estimates with the `bisg()` or `bisg_me()` functions (or using any other probabilistic race prediction tool).
Then, you estimate the distribution of an outcome variable by race using the `birdie()` function.

```{r}
library(birdie)

data(pseudo_vf)

head(pseudo_vf)
```

To compute BISG probabilities, you provide the last name and (optionally) geography variables as part of a formula.

```{r}
r_probs = bisg(~ nm(last_name) + zip(zip), data=pseudo_vf)

head(r_probs)
```

Computing regression estimates requires specifying a model structure.
Here, we'll use a Categorical-Dirichlet regression model that lets the
relationship between turnout and race vary by ZIP code.
This is the "no-pooling" model from McCartan et al.
We'll use Gibbs sampling for inference, which will also let us capture the uncertainty in our estimates.

```{r}
fit = birdie(r_probs, turnout ~ proc_zip(zip), data=pseudo_vf,
family=cat_dir(), algorithm="gibbs")

print(fit)
```

The `proc_zip()` function fills in missing ZIP codes, among other things.
We can extract the estimated conditional distributions with `coef()`.
We can also get updated BISG probabilities that additionally condition on turnout using `fitted()`.
Additional functions allow us to extract a tidy version of our estimates (`tidy()`)
and visualize the estimated distributions (`plot()`).

```{r}
coef(fit)

head(fitted(fit))

tidy(fit)

plot(fit)
```

A more detailed introduction to the method and software package can be found
on the [Get Started](https://corymccartan.com/birdie/articles/birdie.html) page.