
An open API service indexing awesome lists of open source software.

An R Function for calculating genetic risk scores in the UK Biobank cohort using DNA Nexus

Last synced: 4 months ago
JSON representation

An R Function for calculating genetic risk scores in the UK Biobank cohort using DNA Nexus




# DNANexus GRS

## Description

This repository contains two functions:

**Calculate_GRS.R** is a function built to evaluate genetic risk scores in the UK biobank cohort, using the imputed genotypes and RStudio workbench on the DNA Nexus platform. It takes one input, a file with a list of chromosome, base pair, other allele, effect allele, and weight, and returns a data frame with two columns, eid and grs. Please note that it takes about 20 minutes to compile a GRS on the default DNA Nexus settings. Most of this time is spent extracting SNPs from the BGEN files, which is a slow process through R.

Important notes:

* If any SNPs are missing, it will just exclude them and not tell you about it. I'm working on it.
* SNPs must be entered in chr bp format, and must be in build 37. This is to match the index bgen files stored on the DNA Nexus RAP

**extract_snp.R** is not required for Calculate_GRS, but is a potentially useful related function that extracts the genotype information for one SNP in from the imputed data and stores it in a dataframe. The function takes two inputs, chromosome and base pair, and returns a lit with two outputs, one with the genotype data and one with the snp info. The genotype data is a dataframe with two columns, id and genotype. The snp info contains chromosome position rsid number_of_alleles allele0 allele1.

extract_snp can be run, e.g. using `extract_snp(8,128077146)`. It takes about one minute and is not recommended for outputting lots of SNPs. The speed for these functions is limited by the speed of `bgen.load`, and a future release will add an extra function to make this process quicker for multiple SNPs.

## Example script for Calculate_GRS

This script has been written to run on the RStudio Workbench on DNA Nexus, which at the time of writing runs First, run



Then, a genetic risk score can be generated by running


Where `snp file` is a tab seperated file, which looks like this


See the file `contigrs` as an example.

## Complete Script

Users on the Exomes_450K Project at the Univeristy of Exeter can run the following example to calculate the Conti et. al. GRS and test its predictive power against Prostate Cancer.

This script relies on

install.packages( "", repos = NULL, type = "source" )

system('dx download file-GP3GfZjJZ8kYP25V5VZ1BGfx') #this file has the conti et al snps in it
source_url("") #makes Calculate_GRS available
grs=generate_grs('conti_et_al_prostate_snps') #this uses the downloaded file and makes a grs

source_url("") # this script is used to derive the Prostate Cancer phenotype
prostate_cancer=first_occurence(cancer='C61',ICD10='C61')%>%mutate(prca=1) # this uses my own first occurence code

all_data$prca[$prca)]=0 #all_data$prca is now a list of 1s and 0s

# plot density plots of the distribution in caes and controls

#build a ROC curve with the AUC
logit <- glm(prca~grs, data = all_data, family = "binomial")
prob = predict(logit, newdata = all_data, type = "response")
roc(all_data$prca ~ prob, plot = TRUE, print.auc = TRUE, ci=TRUE)