An open API service indexing awesome lists of open source software.

https://github.com/benkeser/sievetrend

Tests for trends in vaccine efficacy by genetic distance
https://github.com/benkeser/sievetrend

competing-risks hamming-distance sieve-analysis tmle vaccines

Last synced: about 2 months ago
JSON representation

Tests for trends in vaccine efficacy by genetic distance

Awesome Lists containing this project

README

        

---
title: "Assessing trends in vaccine efficacy by pathogen genetic distance"
author: "David Benkeser, Michal Juraska, and Peter Gilbert"
date: "September 28, 2017"
output:
md_document:
variant: markdown_github
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Installation of `sievetrend`

The `sievetrend` repository is available for download as an `R` package and may be
downloaded directly from GitHub as follows.

```{r, messages = FALSE, warnings = FALSE}
# install sievetrend from GitHub
devtools::install_github("benkeser/sievetrend")
library(sievetrend)
# also will require package survtmle
if(!("survtmle" %in% row.names(installed.packages()))){
install.packages("survtmle")
}
library(survtmle)
```

Below, we include the code that was used to perform the simulation study
and to analyze the RTS,S data for the manuscript.

## Simulation

Here, we demonstrate how to obtain results for a simulated data set.

```{r}
# sample size
n <- 1000

# set random seed
set.seed(1234)

# simulate data set
dat <- makeData(n = n)

# get the formula for the empirical estimate of the
# censoring distribution needed for input to survtmle
glm.ctime <- get.ctimeForm(trt = dat$trt, site = dat$adjustVars$site,
ftime = dat$ftime, ftype = dat$ftype)

# call survtmle to estimate cumulative incidence
object <- survtmle(ftime = dat$ftime,
ftype = dat$ftype,
adjustVars = dat$adjustVars,
trt = dat$trt,
glm.trt = "1",
glm.ftime = "trt*factor(site)",
glm.ctime = glm.ctime,
method = "mean",
t0=6)

# call trend_test to estimate projection
trend <- trend_test(object)
trend
```

A numerical approximation of the true value of $\beta_{0,n}$ may be obtained as follows.

```{r}
# get covariance matrix estimate
nabla_g <- sievetrend:::grad_g(object$est)
Upsilon_n <- nabla_g %*% cov(Reduce(cbind, object$ic)) %*% t(nabla_g)
# call getTruth function with this estimate several times
# and average (to increase accuracy)
beta_0n <- rowMeans(replicate(20, getTruth(Upsilon = Upsilon_n, n = 1e6)))[2]

# compare estimate to truth
c(est = trend$beta, truth = beta_0n)
```

The full code used to execute the simulation study is included in the simulation subdirectory. This includes the script `cent.R`, which is batched to a slurm via
the script `sce.sh`. The function `makeIllustrationPlot` was used to produce Figure 1.

## RTS,S Analysis

Due to existing privacy agreements, it is difficult to obtain access to the real RTS,S data. Nevertheless, a mock data set is distributed with the `survtmle` package that will serve as illustration for the outputation methodology. See `?rtss` for further description of the data set.

The `rtss` data set is a list of ten data sets each representing an outputed data set. They are formatted for analysis of a binary genetic mark, so we first replace the failure type column with a simulated genetic distance.

```{r}
# replace the ftype column in each data set
rtss_mod <- lapply(rtss, function(data){
# which were observed failures
fail_idx <- which(data$ftype > 0)
# number of observed failures
fail_n <- length(fail_idx)
# replace with a simulated version
data$ftype[fail_idx] <- rbinom(fail_n, 4, plogis(data$vaccine)) + 1
# collapse site variable into a single column
data$site <- as.numeric(1*(data$site1==1) + 2*(data$site2==1) +
3*(data$site3==1) + 4*(data$site4==1) +
5*(data$site5==1))
return(data)
})

# check out new ftype distribution for first
# outputed data set
table(rtss_mod[[1]]$ftype)
```

Now we use `survtmle` to estimate the cumulative incidence for each data set.

```{r}
rslt <- lapply(rtss_mod, function(data){
glm.ctime <- get.ctimeForm(trt = data$trt, site = data$site,
ftime = data$ftime, ftype = data$ftype)

object <- survtmle(ftime = data$ftime,
ftype = data$ftype,
adjustVars = data[,"site",drop=FALSE],
trt = data$vaccine,
glm.trt = "1",
glm.ftime = "trt*factor(site)",
glm.ctime = glm.ctime,
method = "mean",
t0=6)
return(object)
})
```

We can use the `getMO` function to obtain the averaged cumulative incidence
results and apply the trend_test on this scale.

```{r}
# obtain averaged results
mo_rslt <- getMO(rslt)

# apply trend test
mo_trend <- trend_test(mo_rslt)
mo_trend
```