Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ngreifer/lmw

lmw: Linear Model Weights
https://github.com/ngreifer/lmw

Last synced: 10 days ago
JSON representation

lmw: Linear Model Weights

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
warning = FALSE,
message = FALSE,
tidy = FALSE,
fig.align='center',
comment = "#>",
fig.path = "man/figures/README-",
R.options = list(width = 200)
)
```
# lmw: Linear Model Weights

[![CRAN_Status_Badge](https://img.shields.io/cran/v/lmw)](https://cran.r-project.org/package=lmw) [![CRAN_Downloads_Badge](https://cranlogs.r-pkg.org/badges/lmw)](https://cran.r-project.org/package=lmw)
------
### Overview

`lmw` computes weights implied by a linear regression model used to estimate an average treatment effect and provides diagnostics that incorporate these weights as described in [Chattopadhyay & Zubizarreta (2023)](https://doi.org/10.1093/biomet/asac058). The treatment effect resulting from this model can be represented as a difference between the weighted outcome means in the treatment groups, similar to inverse probability weighting.

The weights implied by the regression model have several interesting characteristics: they yield exact mean balance between the treatment groups for every covariate included in the model, they have minimum variance among all weights that do so, and they may be negative. Despite yielding exact mean balance between treatment groups, they may not yield balance between the treatment groups and the target population corresponding to the desired estimand; in addition, the negative weights they produce indicate extrapolation beyond the support of the covariates. `lmw` provides tools to compute the implied weights and perform diagnostics to assess balance, extrapolation, influence, and distributional properties of the weights. In addition, `lmw` provides tools to estimate average treatment effects from the specified models that correspond to selected target populations.

Below is an example of the use of `lmw` to obtain and evaluate regression weights for estimating the average treatment effect on the treated (ATT) of a job training program on earnings using the Lalonde dataset (see `help("lalonde", package = "lmw")` for details). Here, the treatment variable is `treat`, the outcome is `re78` (1978 earnings) and `age`, `education`, `race`, `married`, `nodegree`, `re74`, and `re75` are pretreatment covariates.

```{r}
#Load lmw and the data
library("lmw")
data("lalonde")

#Estimate the weights
lmw.out <- lmw(~ treat + age + education + race + married +
nodegree + re74 + re75,
data = lalonde, treat = "treat",
estimand = "ATT", method = "URI")
```

`lmw.out` contains the implied regression weights. We can see that the weighted outcome mean difference between the treatment groups is equal to the coefficient on the treatment in the corresponding outcome regression model:

```{r}
#The weighted difference in means
w <- lmw.out$weights
with(lalonde,
weighted.mean(re78[treat == 1], w[treat == 1]) -
weighted.mean(re78[treat == 0], w[treat == 0]))

#The coefficient in the corresponding outcome model
fit <- lm(re78 ~ treat + age + education + race + married +
nodegree + re74 + re75,
data = lalonde)
coef(fit)["treat"]
```

How well do the weights do at balancing the treatment groups to the target population (in the case, the treated sample)?

```{r}
(s <- summary(lmw.out))
```

The `SMD` column contains the standardized mean differences for each covariate between the treatment groups; after weighting, the values are all equal to zero because of the properties of the implied regression weights. However, the difference between each treatment group and target sample remain, as displayed in the `TSMD Treated` and `TSMD Control` columns, which contain the standardized mean differences between each treatment group and the target sample (which in this case is the treated group because the ATT was requested).

We can summarize this balance table in a plot:

```{r}
plot(s)
```

We can examine the distribution of weights to see to what degree negative weights are present:

```{r}
plot(lmw.out, type = "weights")
```

A red line indicates the average of the weights in that group (the weights are scaled so this is always equal to 1). Negative weights are present in the control group. Weights are also highly variable in the control group, leading to a decreased effective sample size (ESS).

We can further examine extrapolation for specific covariates:

```{r}
plot(lmw.out, type = "extrapolation",
variables = ~married + re75)
```

The $\times$ indicates the mean of the covariate in the target population (the treated group), and the vertical line indicates the mean of the covariate weighted by the implied regression weights. For these covariates, the implied regression weights yield a sample fairly representative of the target population.

We can examine how influential individual points are using their sample influence curves, which are a function of their residuals, leverages, and implied weights:

```{r}
plot(lmw.out, type = "influence", outcome = re78)
```

Finally, we can fit the outcome model and extract the average treatment effect estimates:

```{r}
lmw.fit <- lmw_est(lmw.out, outcome = re78)

summary(lmw.fit)
```

In addition, `lmw` can be used with multi-category treatments, two-stage least squares estimation of instrumental variable models, and doubly-robust estimators by interfacing with the `MatchIt` and `WeightIt` packages to implement these diagnostics for regression in a matched or weighted sample.

### References

Chattopadhyay, A., & Zubizarreta, J. R. (2023). On the implied weights of linear regression for causal inference. *Biometrika*, 110(3), 615–629. https://doi.org/10.1093/biomet/asac058