Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jackmwolf/tehtuner

An R Package to Fit and Tune to Models to Detect Treatment Effect Heterogeneity
https://github.com/jackmwolf/tehtuner

clinical-trials heterogeneity-of-treatment-effect r subgroup-identification

Last synced: 3 months ago
JSON representation

An R Package to Fit and Tune to Models to Detect Treatment Effect Heterogeneity

Awesome Lists containing this project

README

        

---
output: github_document
editor_options:
chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)

library(ggplot2)
library(ggtext)
library(magrittr)
library(kableExtra)
library(Cairo)
library(rpart.plot)
```

# tehtuner

[![CRAN status](https://www.r-pkg.org/badges/version/tehtuner)](https://CRAN.R-project.org/package=tehtuner)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05453/status.svg)](https://doi.org/10.21105/joss.05453)
[![R-CMD-check](https://github.com/jackmwolf/tehtuner/workflows/R-CMD-check/badge.svg)](https://github.com/jackmwolf/tehtuner/actions)

The goal of `tehtuner` is to implement methods to fit models to detect and model
treatment effect heterogeneity (TEH) while controlling the Type-I error of falsely
detecting a differential effect when the conditional average treatment effect is
uniform across the study population.

Currently `tehtuner` supports Virtual Twins models (Foster
et al., 2011) for detecting TEH using the permutation procedure proposed in (Wolf et al., 2022).

Virtual Twins is a two-step approach to detecting differential treatment
effects. Subjects' conditional average treatment effects (CATEs) are first
estimated in Step 1 using a flexible model. Then, a simple and interpretable
model is fit in Step 2 to model these estimated CATEs as a function of the
covariates.

The Step 2 model is dependent on some tuning parameter. This parameter is
selected to control the Type-I error rate by permuting the data under the
null hypothesis of a constant treatment effect and identifying the minimal
null penalty parameter (MNPP), which is the smallest penalty parameter that
yields a Step 2 model with no covariate effects. The $1-\alpha$ quantile
of the distribution of is then used to fit the Step 2 model on the original
data.
In dong so, the Type-I error rate is controlled to be $\alpha$.

## Installation

`tehtuner` is available on [CRAN](https://CRAN.R-project.org); you can download the release version with:

``` r
install.packages("tehtuner")
```

You can download the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("jackmwolf/tehtuner")
```
## Example

We consider simulated data from a small clinical trial with 1000 subjects.
Each subject has 10 measured covariates, 8 continuous and 2 binary.
We are interested in estimating and understanding the CATE through Virtual Twins.

```{r}
library(tehtuner)
data("tehtuner_example")
```

We will consider a Virtual Twins model using a random forest to estimate the CATEs in Step 1 and then fitting a regression tree on the estimated CATEs in Step 2 with the Type-I error rate set at $\alpha = 0.2$.

```{r cache=TRUE}
set.seed(100)
vt_cate <- tunevt(
data = tehtuner_example, Y = "Y", Trt = "Trt", step1 = "randomforest",
step2 = "rtree", alpha0 = 0.2, p_reps = 100, ntree = 50
)
vt_cate
```

The fitted Step 2 model can be accessed via `$vtmod`.
In this case, as we used a regression tree in Step 2, our final model model is of class `rpart.object`.
```{r dev='CairoPNG', warning = FALSE}
vt_cate$vtmod

rpart.plot::rpart.plot(vt_cate$vtmod, digits = -2)
```

The fitted model for the CATE is a function of the covariates (`V1`, and `V3`), so we would conclude that there is treatment effect heterogeneity at the 20% level.

We can also look at the null distribution of the MNPP through `vt_cate$theta_null`.
The 80th percentile of $\hat\theta$ under the null hypothesis is
```{r}
quantile(vt_cate$theta_null, 0.8)
```

while the MNPP of our observed data is
```{r}
vt_cate$mnpp
```

The procedure fit the Step 2 model using the 80th quantile of the null distribution which resulted in a model that included covariates since the MNPP was above the 80th quantile.

```{r mnpp_plot, dev='CairoPNG', echo = FALSE}
ggplot(mapping = aes(x = vt_cate$theta_null, y = after_stat(density))) +
geom_histogram(color = "black", fill = "white", binwidth = 0.025) +
theme_minimal() +
scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
geom_vline(
aes(xintercept = c(vt_cate$mnpp, quantile(vt_cate$theta_null, 0.8))),
color = c("#0072B2", "#D55E00"),
linewidth = 2,
linetype = 2
) +
labs(
y = "Density",
x = expression(hat(theta)),
title = expression("Sampling distribution of" ~ hat(theta) ~ "under" ~ H[0]),
subtitle = paste0(
"",
"80th quantile (critical value): ", formatC(quantile(vt_cate$theta_null, 0.8), digits = 2, format = "f"),
"
",
"; ",
"",
"Observed MNPP: ", formatC(vt_cate$mnpp, 2, format = "f"),
"
"
)
) +
theme(
plot.subtitle = element_markdown(size = 12)
)
```

### Running in Parallel

Version `0.2.0` added the `parallel` option to `tunevt()` which allows the user to perform the permutation procedure in parallel to reduce computation times.
Before doing so, you must register a parallel backend; see `?foreach::foreach` for more information.

For example, to carry out 100 permutations across 2 processors:

```{r eval = FALSE}
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)

vt_cate_parallel <- tunevt(
data = tehtuner_example, Y = "Y", Trt = "Trt", step1 = "randomforest",
step2 = "rtree", alpha0 = 0.2, p_reps = 100, ntree = 50, parallel = TRUE
)

parallel::stopCluster(cl)
```

## References

- Foster, J. C., Taylor, J. M., & Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. _Statistics in Medicine, 30_(24), 2867–2880. https://doi.org/10.1002/sim.4322

- Wolf, J. M., Koopmeiners, J. S., & Vock, D. M. (2022). A permutation procedure to detect heterogeneous treatment effects in randomized clinical trials while controlling the type-I error rate. _Clinical Trials, 19_(5). https://doi.org/10.1177/17407745221095855

- Deng C., Wolf J. M., Vock D. M., Carroll D. M., Hatsukami D. K., Leng N., & Koopmeiners J. S. (2023). “Practical guidance on modeling choices for the virtual twins method.” _Journal of Biopharmaceutical Statistics_. https://doi.org/10.1080/10543406.2023.2170404

- Wolf, J. M., (2023). tehtuner: An R package to fit and tune models for the conditional average treatment effect. _Journal of Open Source Software, 8_(86), 5453. https://doi.org/10.21105/joss.05453