https://github.com/jackmwolf/tehtuner

An R Package to Fit and Tune to Models to Detect Treatment Effect Heterogeneity
https://github.com/jackmwolf/tehtuner

clinical-trials heterogeneity-of-treatment-effect r subgroup-identification

Last synced: about 2 months ago
JSON representation

An R Package to Fit and Tune to Models to Detect Treatment Effect Heterogeneity

Host: GitHub
URL: https://github.com/jackmwolf/tehtuner
Owner: jackmwolf
License: gpl-3.0
Created: 2021-11-06T21:50:51.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-04-09T23:09:16.000Z (2 months ago)
Last Synced: 2025-04-10T00:22:02.070Z (2 months ago)
Topics: clinical-trials, heterogeneity-of-treatment-effect, r, subgroup-identification
Language: R
Homepage:
Size: 1010 KB
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        ---

output: github_document

editor_options: 

  chunk_output_type: console

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

library(ggplot2)

library(ggtext)

library(magrittr)

library(kableExtra)

library(Cairo)

library(rpart.plot)

```

# tehtuner

[![CRAN status](https://www.r-pkg.org/badges/version/tehtuner)](https://CRAN.R-project.org/package=tehtuner)

[![DOI](https://joss.theoj.org/papers/10.21105/joss.05453/status.svg)](https://doi.org/10.21105/joss.05453)

[![R-CMD-check](https://github.com/jackmwolf/tehtuner/workflows/R-CMD-check/badge.svg)](https://github.com/jackmwolf/tehtuner/actions)

The goal of `tehtuner` is to implement methods to fit models to detect and model

treatment effect heterogeneity (TEH) while controlling the Type-I error of falsely 

detecting a differential effect when the conditional average treatment effect is

uniform across the study population.

Currently `tehtuner` supports Virtual Twins models (Foster 

et al., 2011) for detecting TEH using the permutation procedure proposed in (Wolf et al., 2022).

Virtual Twins is a two-step approach to detecting differential treatment

effects. Subjects' conditional average treatment effects (CATEs) are first

estimated in Step 1 using a flexible model. Then, a simple and interpretable

model is fit in Step 2 to model these estimated CATEs as a function of the

covariates.

The Step 2 model is dependent on some tuning parameter. This parameter is

selected to control the Type-I error rate by permuting the data under the

null hypothesis of a constant treatment effect and identifying the minimal

null penalty parameter (MNPP), which is the smallest penalty parameter that

yields a Step 2 model with no covariate effects. The $1-\alpha$ quantile

of the distribution of is then used to fit the Step 2 model on the original

data.

In dong so, the Type-I error rate is controlled to be $\alpha$.

## Installation

`tehtuner` is available on [CRAN](https://CRAN.R-project.org); you can download the release version with:

``` r

install.packages("tehtuner")

```

You can download the development version from [GitHub](https://github.com/) with:

``` r

# install.packages("devtools")

devtools::install_github("jackmwolf/tehtuner")

```

## Example

We consider simulated data from a small clinical trial with 1000 subjects.

Each subject has 10 measured covariates, 8 continuous and 2 binary.

We are interested in estimating and understanding the CATE through Virtual Twins.

```{r}

library(tehtuner)

data("tehtuner_example")

```

We will consider a Virtual Twins model using a random forest to estimate the CATEs in Step 1 and then fitting a regression tree on the estimated CATEs in Step 2 with the Type-I error rate set at $\alpha = 0.2$.

```{r cache=TRUE}

set.seed(100)

vt_cate <- tunevt(

  data = tehtuner_example, Y = "Y", Trt = "Trt", step1 = "randomforest",

  step2 = "rtree", alpha0 = 0.2, p_reps = 100, ntree = 50

)

vt_cate

```

The fitted Step 2 model can be accessed via `$vtmod`. 

In this case, as we used a regression tree in Step 2, our final model model is of class `rpart.object`.

```{r dev='CairoPNG', warning = FALSE}

vt_cate$vtmod

rpart.plot::rpart.plot(vt_cate$vtmod, digits = -2)

```

The fitted model for the CATE is a function of the covariates (`V1`, and `V3`), so we would conclude that there is treatment effect heterogeneity at the 20% level. 

We can also look at the null distribution of the MNPP through `vt_cate$theta_null`.

The 80th percentile of $\hat\theta$ under the null hypothesis is

```{r}

quantile(vt_cate$theta_null, 0.8)

```

while the MNPP of our observed data is

```{r}

vt_cate$mnpp

```

The procedure fit the Step 2 model using the 80th quantile of the null distribution which resulted in a model that included covariates since the MNPP was above the 80th quantile.

```{r mnpp_plot, dev='CairoPNG', echo = FALSE}

ggplot(mapping = aes(x = vt_cate$theta_null, y = after_stat(density))) +

  geom_histogram(color = "black", fill = "white", binwidth = 0.025) +

  theme_minimal() +

  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +

  geom_vline(

    aes(xintercept = c(vt_cate$mnpp, quantile(vt_cate$theta_null, 0.8))),

    color = c("#0072B2", "#D55E00"),

    linewidth = 2,

    linetype = 2

  ) + 

  labs(

    y = "Density",

    x = expression(hat(theta)),

    title =  expression("Sampling distribution of" ~ hat(theta) ~ "under" ~ H[0]),

    subtitle = paste0(

      "",

        "80th quantile (critical value): ", formatC(quantile(vt_cate$theta_null, 0.8), digits = 2, format = "f"),

      "",

      "; ",

      "",

        "Observed MNPP: ", formatC(vt_cate$mnpp, 2, format = "f"), 

      ""

      )

  ) +

  theme(

    plot.subtitle = element_markdown(size = 12)

  )

```

### Running in Parallel

Version `0.2.0` added the `parallel` option to `tunevt()` which allows the user to perform the permutation procedure in parallel to reduce computation times. 

Before doing so, you must register a parallel backend; see `?foreach::foreach` for more information.

For example, to carry out 100 permutations across 2 processors:

```{r eval = FALSE}

cl <- parallel::makeCluster(2)

doParallel::registerDoParallel(cl)

vt_cate_parallel <- tunevt(

  data = tehtuner_example, Y = "Y", Trt = "Trt", step1 = "randomforest",

  step2 = "rtree", alpha0 = 0.2, p_reps = 100, ntree = 50, parallel = TRUE

)

parallel::stopCluster(cl)

```

## References

- Foster, J. C., Taylor, J. M., & Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. _Statistics in Medicine, 30_(24), 2867–2880. https://doi.org/10.1002/sim.4322

- Wolf, J. M., Koopmeiners, J. S., & Vock, D. M. (2022). A permutation procedure to detect heterogeneous treatment effects in randomized clinical trials while controlling the type-I error rate. _Clinical Trials, 19_(5). https://doi.org/10.1177/17407745221095855

- Deng C., Wolf J. M., Vock D. M., Carroll D. M., Hatsukami D. K., Leng N., & Koopmeiners J. S. (2023). “Practical guidance on modeling choices for the virtual twins method.” _Journal of Biopharmaceutical Statistics_. https://doi.org/10.1080/10543406.2023.2170404

- Wolf, J. M., (2023). tehtuner: An R package to fit and tune models for the conditional average treatment effect. _Journal of Open Source Software, 8_(86), 5453. https://doi.org/10.21105/joss.05453

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jackmwolf/tehtuner

Awesome Lists containing this project

README