https://github.com/benkeser/sievetrend
Tests for trends in vaccine efficacy by genetic distance
https://github.com/benkeser/sievetrend
competing-risks hamming-distance sieve-analysis tmle vaccines
Last synced: about 2 months ago
JSON representation
Tests for trends in vaccine efficacy by genetic distance
- Host: GitHub
- URL: https://github.com/benkeser/sievetrend
- Owner: benkeser
- License: other
- Created: 2017-09-28T19:26:08.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-09-28T19:58:41.000Z (over 7 years ago)
- Last Synced: 2025-02-09T09:11:14.066Z (3 months ago)
- Topics: competing-risks, hamming-distance, sieve-analysis, tmle, vaccines
- Language: R
- Size: 25.4 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project
README
---
title: "Assessing trends in vaccine efficacy by pathogen genetic distance"
author: "David Benkeser, Michal Juraska, and Peter Gilbert"
date: "September 28, 2017"
output:
md_document:
variant: markdown_github
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```## Installation of `sievetrend`
The `sievetrend` repository is available for download as an `R` package and may be
downloaded directly from GitHub as follows.```{r, messages = FALSE, warnings = FALSE}
# install sievetrend from GitHub
devtools::install_github("benkeser/sievetrend")
library(sievetrend)
# also will require package survtmle
if(!("survtmle" %in% row.names(installed.packages()))){
install.packages("survtmle")
}
library(survtmle)
```Below, we include the code that was used to perform the simulation study
and to analyze the RTS,S data for the manuscript.## Simulation
Here, we demonstrate how to obtain results for a simulated data set.
```{r}
# sample size
n <- 1000# set random seed
set.seed(1234)# simulate data set
dat <- makeData(n = n)# get the formula for the empirical estimate of the
# censoring distribution needed for input to survtmle
glm.ctime <- get.ctimeForm(trt = dat$trt, site = dat$adjustVars$site,
ftime = dat$ftime, ftype = dat$ftype)# call survtmle to estimate cumulative incidence
object <- survtmle(ftime = dat$ftime,
ftype = dat$ftype,
adjustVars = dat$adjustVars,
trt = dat$trt,
glm.trt = "1",
glm.ftime = "trt*factor(site)",
glm.ctime = glm.ctime,
method = "mean",
t0=6)# call trend_test to estimate projection
trend <- trend_test(object)
trend
```A numerical approximation of the true value of $\beta_{0,n}$ may be obtained as follows.
```{r}
# get covariance matrix estimate
nabla_g <- sievetrend:::grad_g(object$est)
Upsilon_n <- nabla_g %*% cov(Reduce(cbind, object$ic)) %*% t(nabla_g)
# call getTruth function with this estimate several times
# and average (to increase accuracy)
beta_0n <- rowMeans(replicate(20, getTruth(Upsilon = Upsilon_n, n = 1e6)))[2]# compare estimate to truth
c(est = trend$beta, truth = beta_0n)
```The full code used to execute the simulation study is included in the simulation subdirectory. This includes the script `cent.R`, which is batched to a slurm via
the script `sce.sh`. The function `makeIllustrationPlot` was used to produce Figure 1.## RTS,S Analysis
Due to existing privacy agreements, it is difficult to obtain access to the real RTS,S data. Nevertheless, a mock data set is distributed with the `survtmle` package that will serve as illustration for the outputation methodology. See `?rtss` for further description of the data set.
The `rtss` data set is a list of ten data sets each representing an outputed data set. They are formatted for analysis of a binary genetic mark, so we first replace the failure type column with a simulated genetic distance.
```{r}
# replace the ftype column in each data set
rtss_mod <- lapply(rtss, function(data){
# which were observed failures
fail_idx <- which(data$ftype > 0)
# number of observed failures
fail_n <- length(fail_idx)
# replace with a simulated version
data$ftype[fail_idx] <- rbinom(fail_n, 4, plogis(data$vaccine)) + 1
# collapse site variable into a single column
data$site <- as.numeric(1*(data$site1==1) + 2*(data$site2==1) +
3*(data$site3==1) + 4*(data$site4==1) +
5*(data$site5==1))
return(data)
})# check out new ftype distribution for first
# outputed data set
table(rtss_mod[[1]]$ftype)
```Now we use `survtmle` to estimate the cumulative incidence for each data set.
```{r}
rslt <- lapply(rtss_mod, function(data){
glm.ctime <- get.ctimeForm(trt = data$trt, site = data$site,
ftime = data$ftime, ftype = data$ftype)object <- survtmle(ftime = data$ftime,
ftype = data$ftype,
adjustVars = data[,"site",drop=FALSE],
trt = data$vaccine,
glm.trt = "1",
glm.ftime = "trt*factor(site)",
glm.ctime = glm.ctime,
method = "mean",
t0=6)
return(object)
})
```We can use the `getMO` function to obtain the averaged cumulative incidence
results and apply the trend_test on this scale.```{r}
# obtain averaged results
mo_rslt <- getMO(rslt)# apply trend test
mo_trend <- trend_test(mo_rslt)
mo_trend
```