https://github.com/s3alfisc/wildrwolf
Romano-Wolf p-value adjustments for multiple hypotheses testing via the wild bootstrap for objects of type fixest and fixest_multi from the fixest package
https://github.com/s3alfisc/wildrwolf
fixest multiple-comparisons r romano-wolf wild-bootstrap wild-cluster-bootstrap
Last synced: 3 months ago
JSON representation
Romano-Wolf p-value adjustments for multiple hypotheses testing via the wild bootstrap for objects of type fixest and fixest_multi from the fixest package
- Host: GitHub
- URL: https://github.com/s3alfisc/wildrwolf
- Owner: s3alfisc
- License: gpl-3.0
- Created: 2021-06-13T15:46:54.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2024-01-14T11:28:25.000Z (over 2 years ago)
- Last Synced: 2024-03-14T22:10:21.913Z (over 2 years ago)
- Topics: fixest, multiple-comparisons, r, romano-wolf, wild-bootstrap, wild-cluster-bootstrap
- Language: R
- Homepage: https://s3alfisc.github.io/wildrwolf/
- Size: 4.56 MB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# wildrwolf `r emo::ji("wolf")`
[](https://github.com/s3alfisc/wildrwolf/actions)
[](https://cran.r-project.org/package=wildrwolf)
[](https://lifecycle.r-lib.org/articles/stages.html)
[](https://cran.r-project.org/package=wildrwolf)

[](https://app.codecov.io/gh/s3alfisc/wildrwolf?branch=main)
The `wildrwolf` package implements Romano-Wolf multiple-hypothesis-adjusted p-values for objects of type `fixest` and `fixest_multi` from the `fixest` package via a wild (cluster) bootstrap.
Because the bootstrap-resampling is based on the [fwildclusterboot](https://github.com/s3alfisc/fwildclusterboot) package, `wildrwolf` is usually really fast.
The package is complementary to [wildwyoung](https://github.com/s3alfisc/wildwyoung) (still work in progress), which implements the multiple hypothesis adjustment method following Westfall and Young (1993).
Adding support for multi-way clustering is work in progress.
## Installation
You can install the package from CRAN and the development version from [GitHub](https://github.com/) with:
``` r
install.packages("wildrwolf")
# install.packages("devtools")
devtools::install_github("s3alfisc/wildrwolf")
# from r-universe (windows & mac, compiled R > 4.0 required)
install.packages('wildrwolf', repos ='https://s3alfisc.r-universe.dev')
```
## Example I
```{r, warning=FALSE, message = FALSE}
library(wildrwolf)
library(fixest)
set.seed(1412)
N <- 1000
X1 <- rnorm(N)
X2 <- rnorm(N)
rho <- 0.5
sigma <- matrix(rho, 4, 4); diag(sigma) <- 1
u <- MASS::mvrnorm(n = N, mu = rep(0, 4), Sigma = sigma)
Y1 <- 1 + 1 * X1 + X2
Y2 <- 1 + 0.01 * X1 + X2
Y3 <- 1 + 0.4 * X1 + X2
Y4 <- 1 + -0.02 * X1 + X2
for(x in 1:4){
var_char <- paste0("Y", x)
assign(var_char, get(var_char) + u[,x])
}
data <- data.frame(Y1 = Y1,
Y2 = Y2,
Y3 = Y3,
Y4 = Y4,
X1 = X1,
X2 = X2,
#group_id = group_id,
splitvar = sample(1:2, N, TRUE))
fit <- feols(c(Y1, Y2, Y3, Y4) ~ csw(X1,X2),
data = data,
se = "hetero",
ssc = ssc(cluster.adj = TRUE))
# clean workspace except for res & data
rm(list= ls()[!(ls() %in% c('fit','data'))])
res_rwolf1 <- wildrwolf::rwolf(
models = fit,
param = "X1",
B = 9999
)
pvals <- lapply(fit, function(x) pvalue(x)["X1"]) |> unlist()
# Romano-Wolf Corrected P-values
res_rwolf1
```
## Example II
```{r, warning = FALSE, message = FALSE}
fit1 <- feols(Y1 ~ X1 , data = data)
fit2 <- feols(Y1 ~ X1 + X2, data = data)
fit3 <- feols(Y2 ~ X1, data = data)
fit4 <- feols(Y2 ~ X1 + X2, data = data)
res_rwolf2 <- rwolf(
models = list(fit1, fit2, fit3, fit4),
param = "X1",
B = 9999
)
res_rwolf2
```
## Performance
The above procedure with `S=8` hypotheses, `N=1000` observations and `k %in% (1,2)` parameters finishes in around 5 seconds.
```{r, warning = FALSE, message = FALSE}
if(requireNamespace("microbenchmark")){
microbenchmark::microbenchmark(
"Romano-Wolf" = wildrwolf::rwolf(
models = fit,
param = "X1",
B = 9999
),
times = 1
)
}
```
## But does it work? Monte Carlo Experiments
We test $S=6$ hypotheses and generate data as
$$Y_{i,s,g} = \beta_{0} + \beta_{1,s} D_{i} + u_{i,g} + \epsilon_{i,s} $$
where $D_i = 1(U_i > 0.5)$ and $U_i$ is drawn from a uniform distribution, $u_{i,g}$ is a cluster level shock with intra-cluster correlation $0.5$, and the idiosyncratic error term is drawn from a multivariate random normal distribution with mean $0_S$ and covariance matrix
```{r}
S <- 6
rho <- 0.5
Sigma <- matrix(rho, 6, 6)
diag(Sigma) <- 1
Sigma
```
with $\rho \geq 0$. We assume that $\beta_{1,s}= 0$ for all $s$.
This experiment imposes a data generating process as in equation (9) in [Clarke, Romano and Wolf](https://docs.iza.org/dp12845.pdf), with an additional error term $u_g$ for $G=20$ clusters and intra-cluster correlation 0.5 and $N=1000$ observations.
You can run the simulations via the `run_fwer_sim()` function attached in the package.
```{r, message = FALSE, results = "hide"}
# note that this will take some time
res <- run_fwer_sim(
seed = 76,
n_sims = 1000,
B = 499,
N = 1000,
s = 6,
rho = 0.5 #correlation between hypotheses, not intra-cluster!
)
```
Both Holm's method and `wildrwolf` control the family wise error rates, at both the 5 and 10% significance level.
```{r}
res
```
## Comparison with Stata's rwolf package
```{r, eval = FALSE}
library(RStata)
# initiate RStata
options("RStata.StataPath" = "\"C:\\Program Files\\Stata17\\StataBE-64\"")
options("RStata.StataVersion" = 17)
# save the data set so it can be loaded into STATA
write.csv(data, "c:/Users/alexa/Dropbox/rwolf/inst/extdata/readme.csv")
# estimate with stata via Rstata
stata_program <- "
clear
set more off
import delimited c:/Users/alexa/Dropbox/rwolf/inst/data/readme.csv
set seed 1
rwolf y1 y2 y3 y4, indepvar(x1) controls(x2) reps(9999)
"
RStata::stata(stata_program, data.out = TRUE)
# Romano-Wolf step-down adjusted p-values
#
#
# Independent variable: x1
# Outcome variables: y1 y2 y3 y4
# Number of resamples: 9999
#
#
# ------------------------------------------------------------------------------
# Outcome Variable | Model p-value Resample p-value Romano-Wolf p-value
# --------------------+---------------------------------------------------------
# y1 | 0.0000 0.0001 0.0001
# y2 | 0.3904 0.3755 0.6070
# y3 | 0.0000 0.0001 0.0001
# y4 | 0.9586 0.9596 0.9596
# ------------------------------------------------------------------------------
```
For comparison, `wildrwolf` produces the following output:
```{r, warning = FALSE, message = FALSE, eval = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2
, data = data, se = "hetero")
```
```{r, include = FALSE}
models <- feols(c(Y1, Y2, Y3, Y4) ~ X1 + X2
, data = data, se = "hetero")
```
```{r}
rwolf(models, param = "X1", B = 9999)
```