https://github.com/benkeser/sievetrend

Tests for trends in vaccine efficacy by genetic distance
https://github.com/benkeser/sievetrend

competing-risks hamming-distance sieve-analysis tmle vaccines

Last synced: about 2 months ago
JSON representation

Tests for trends in vaccine efficacy by genetic distance

Host: GitHub
URL: https://github.com/benkeser/sievetrend
Owner: benkeser
License: other
Created: 2017-09-28T19:26:08.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2017-09-28T19:58:41.000Z (over 7 years ago)
Last Synced: 2025-02-09T09:11:14.066Z (3 months ago)
Topics: competing-risks, hamming-distance, sieve-analysis, tmle, vaccines
Language: R
Size: 25.4 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

        ---

title: "Assessing trends in vaccine efficacy by pathogen genetic distance"

author: "David Benkeser, Michal Juraska, and Peter Gilbert"

date: "September 28, 2017"

output:

    md_document:

        variant: markdown_github

---

    

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

```

## Installation of `sievetrend`

The `sievetrend` repository is available for download as an `R` package and may be

downloaded directly from GitHub as follows. 

```{r, messages = FALSE, warnings = FALSE}

# install sievetrend from GitHub

devtools::install_github("benkeser/sievetrend")

library(sievetrend)

# also will require package survtmle

if(!("survtmle" %in% row.names(installed.packages()))){

	install.packages("survtmle")	

}

library(survtmle)

```

Below, we include the code that was used to perform the simulation study

and to analyze the RTS,S data for the manuscript. 

## Simulation

Here, we demonstrate how to obtain results for a simulated data set. 

```{r}

# sample size

n <- 1000

# set random seed

set.seed(1234)

# simulate data set

dat <- makeData(n = n)

# get the formula for the empirical estimate of the

# censoring distribution needed for input to survtmle

glm.ctime <- get.ctimeForm(trt = dat$trt, site = dat$adjustVars$site, 

                           ftime = dat$ftime, ftype = dat$ftype)

# call survtmle to estimate cumulative incidence

object <- survtmle(ftime = dat$ftime,

                  ftype = dat$ftype,

                  adjustVars = dat$adjustVars,

                  trt = dat$trt,

                  glm.trt = "1",

                  glm.ftime = "trt*factor(site)",

                  glm.ctime = glm.ctime,

                  method = "mean",

                  t0=6)

# call trend_test to estimate projection

trend <- trend_test(object)

trend

```

A numerical approximation of the true value of $\beta_{0,n}$ may be obtained as follows.

```{r}

# get covariance matrix estimate

nabla_g <- sievetrend:::grad_g(object$est)

Upsilon_n <- nabla_g %*% cov(Reduce(cbind, object$ic)) %*% t(nabla_g) 

# call getTruth function with this estimate several times 

# and average (to increase accuracy)

beta_0n <- rowMeans(replicate(20, getTruth(Upsilon = Upsilon_n, n = 1e6)))[2]

# compare estimate to truth

c(est = trend$beta, truth = beta_0n)

```

The full code used to execute the simulation study is included in the simulation subdirectory. This includes the script `cent.R`, which is batched to a slurm via

the script `sce.sh`. The function `makeIllustrationPlot` was used to produce Figure 1.

## RTS,S Analysis

Due to existing privacy agreements, it is difficult to obtain access to the real RTS,S data. Nevertheless, a mock data set is distributed with the `survtmle` package that will serve as illustration for the outputation methodology. See `?rtss` for further description of the data set. 

The `rtss` data set is a list of ten data sets each representing an outputed data set. They are formatted for analysis of a binary genetic mark, so we first replace the failure type column with a simulated genetic distance.

```{r}

# replace the ftype column in each data set

rtss_mod <- lapply(rtss, function(data){

	# which were observed failures

	fail_idx <- which(data$ftype > 0)

	# number of observed failures

	fail_n <- length(fail_idx)

	# replace with a simulated version

	data$ftype[fail_idx] <- rbinom(fail_n, 4, plogis(data$vaccine)) + 1

	# collapse site variable into a single column

	data$site <- as.numeric(1*(data$site1==1) + 2*(data$site2==1) + 

	                        3*(data$site3==1) + 4*(data$site4==1) + 

	                        5*(data$site5==1))

	return(data)

})

# check out new ftype distribution for first

# outputed data set

table(rtss_mod[[1]]$ftype)

```

Now we use `survtmle` to estimate the cumulative incidence for each data set.

```{r}

rslt <- lapply(rtss_mod, function(data){

	glm.ctime <- get.ctimeForm(trt = data$trt, site = data$site, 

                           ftime = data$ftime, ftype = data$ftype)

	object <- survtmle(ftime = data$ftime,

                  ftype = data$ftype,

                  adjustVars = data[,"site",drop=FALSE],

                  trt = data$vaccine,

                  glm.trt = "1",

                  glm.ftime = "trt*factor(site)",

                  glm.ctime = glm.ctime,

                  method = "mean",

                  t0=6)

	return(object)

})

```

We can use the `getMO` function to obtain the averaged cumulative incidence 

results and apply the trend_test on this scale.

```{r}

# obtain averaged results

mo_rslt <- getMO(rslt)

# apply trend test

mo_trend <- trend_test(mo_rslt)

mo_trend

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/benkeser/sievetrend

Awesome Lists containing this project

README