An open API service indexing awesome lists of open source software.

https://github.com/s3alfisc/wildrwolf

Romano-Wolf p-value adjustments for multiple hypotheses testing via the wild bootstrap for objects of type fixest and fixest_multi from the fixest package
https://github.com/s3alfisc/wildrwolf

fixest multiple-comparisons r romano-wolf wild-bootstrap wild-cluster-bootstrap

Last synced: 3 months ago
JSON representation

Romano-Wolf p-value adjustments for multiple hypotheses testing via the wild bootstrap for objects of type fixest and fixest_multi from the fixest package

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# wildrwolf `r emo::ji("wolf")`

[![R-CMD-check](https://github.com/s3alfisc/wildrwolf/workflows/R-CMD-check/badge.svg)](https://github.com/s3alfisc/wildrwolf/actions)
[![](http://cranlogs.r-pkg.org/badges/last-month/wildrwolf)](https://cran.r-project.org/package=wildrwolf)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![](https://www.r-pkg.org/badges/version/wildrwolf)](https://cran.r-project.org/package=wildrwolf)
![runiverse-package](https://s3alfisc.r-universe.dev/badges/wildrwolf)
[![Codecov test coverage](https://codecov.io/gh/s3alfisc/wildrwolf/branch/main/graph/badge.svg)](https://app.codecov.io/gh/s3alfisc/wildrwolf?branch=main)

The `wildrwolf` package implements Romano-Wolf multiple-hypothesis-adjusted p-values for objects of type `fixest` and `fixest_multi` from the `fixest` package via a wild (cluster) bootstrap.

Because the bootstrap-resampling is based on the [fwildclusterboot](https://github.com/s3alfisc/fwildclusterboot) package, `wildrwolf` is usually really fast.

The package is complementary to [wildwyoung](https://github.com/s3alfisc/wildwyoung) (still work in progress), which implements the multiple hypothesis adjustment method following Westfall and Young (1993).

Adding support for multi-way clustering is work in progress.

## Installation

You can install the package from CRAN and the development version from [GitHub](https://github.com/) with:

``` r
install.packages("wildrwolf")

# install.packages("devtools")
devtools::install_github("s3alfisc/wildrwolf")

# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')
```

## Example I

```{r, warning=FALSE, message = FALSE}
library(wildrwolf)
library(fixest)

set.seed(1412)

N <- 1000
X1 <- rnorm(N)
X2 <- rnorm(N)
rho <- 0.5
sigma <- matrix(rho, 4, 4); diag(sigma) <- 1
u <- MASS::mvrnorm(n = N, mu = rep(0, 4), Sigma = sigma)
Y1 <- 1 + 1 * X1 + X2
Y2 <- 1 + 0.01 * X1 + X2
Y3 <- 1 + 0.4 * X1 + X2
Y4 <- 1 + -0.02 * X1 + X2
for(x in 1:4){
var_char <- paste0("Y", x)
assign(var_char, get(var_char) + u[,x])
}

data <- data.frame(Y1 = Y1,
Y2 = Y2,
Y3 = Y3,
Y4 = Y4,
X1 = X1,
X2 = X2,
#group_id = group_id,
splitvar = sample(1:2, N, TRUE))

fit <- feols(c(Y1, Y2, Y3, Y4) ~ csw(X1,X2),
data = data,
se = "hetero",
ssc = ssc(cluster.adj = TRUE))

# clean workspace except for res & data
rm(list= ls()[!(ls() %in% c('fit','data'))])

res_rwolf1 <- wildrwolf::rwolf(
models = fit,
param = "X1",
B = 9999
)

pvals <- lapply(fit, function(x) pvalue(x)["X1"]) |> unlist()

# Romano-Wolf Corrected P-values
res_rwolf1

```

## Example II

```{r, warning = FALSE, message = FALSE}
fit1 <- feols(Y1 ~ X1 , data = data)
fit2 <- feols(Y1 ~ X1 + X2, data = data)
fit3 <- feols(Y2 ~ X1, data = data)
fit4 <- feols(Y2 ~ X1 + X2, data = data)

res_rwolf2 <- rwolf(
models = list(fit1, fit2, fit3, fit4),
param = "X1",
B = 9999
)
res_rwolf2
```

## Performance

The above procedure with `S=8` hypotheses, `N=1000` observations and `k %in% (1,2)` parameters finishes in around 5 seconds.

```{r, warning = FALSE, message = FALSE}
if(requireNamespace("microbenchmark")){

microbenchmark::microbenchmark(
"Romano-Wolf" = wildrwolf::rwolf(
models = fit,
param = "X1",
B = 9999
),
times = 1
)

}
```

## But does it work? Monte Carlo Experiments

We test $S=6$ hypotheses and generate data as

$$Y_{i,s,g} = \beta_{0} + \beta_{1,s} D_{i} + u_{i,g} + \epsilon_{i,s} $$
where $D_i = 1(U_i > 0.5)$ and $U_i$ is drawn from a uniform distribution, $u_{i,g}$ is a cluster level shock with intra-cluster correlation $0.5$, and the idiosyncratic error term is drawn from a multivariate random normal distribution with mean $0_S$ and covariance matrix

```{r}
S <- 6
rho <- 0.5
Sigma <- matrix(rho, 6, 6)
diag(Sigma) <- 1
Sigma
```

with $\rho \geq 0$. We assume that $\beta_{1,s}= 0$ for all $s$.

This experiment imposes a data generating process as in equation (9) in [Clarke, Romano and Wolf](https://docs.iza.org/dp12845.pdf), with an additional error term $u_g$ for $G=20$ clusters and intra-cluster correlation 0.5 and $N=1000$ observations.

You can run the simulations via the `run_fwer_sim()` function attached in the package.

```{r, message = FALSE, results = "hide"}
# note that this will take some time
res <- run_fwer_sim(
seed = 76,
n_sims = 1000,
B = 499,
N = 1000,
s = 6,
rho = 0.5 #correlation between hypotheses, not intra-cluster!
)
```

Both Holm's method and `wildrwolf` control the family wise error rates, at both the 5 and 10% significance level.

```{r}
res
```

## Comparison with Stata's rwolf package

```{r, eval = FALSE}
library(RStata)
# initiate RStata
options("RStata.StataPath" = "\"C:\\Program Files\\Stata17\\StataBE-64\"")
options("RStata.StataVersion" = 17)
# save the data set so it can be loaded into STATA
write.csv(data, "c:/Users/alexa/Dropbox/rwolf/inst/extdata/readme.csv")

# estimate with stata via Rstata
stata_program <- "
clear
set more off
import delimited c:/Users/alexa/Dropbox/rwolf/inst/data/readme.csv
set seed 1
rwolf y1 y2 y3 y4, indepvar(x1) controls(x2) reps(9999)
"
RStata::stata(stata_program, data.out = TRUE)

# Romano-Wolf step-down adjusted p-values
#
#
# Independent variable: x1
# Outcome variables: y1 y2 y3 y4
# Number of resamples: 9999
#
#
# ------------------------------------------------------------------------------
# Outcome Variable | Model p-value Resample p-value Romano-Wolf p-value
# --------------------+---------------------------------------------------------
# y1 | 0.0000 0.0001 0.0001
# y2 | 0.3904 0.3755 0.6070
# y3 | 0.0000 0.0001 0.0001
# y4 | 0.9586 0.9596 0.9596
# ------------------------------------------------------------------------------

```

For comparison, `wildrwolf` produces the following output:

```{r, warning = FALSE, message = FALSE, eval = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2
, data = data, se = "hetero")
```

```{r, include = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2
, data = data, se = "hetero")
```

```{r}
rwolf(models, param = "X1", B = 9999)

```