Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kenhanscombe/plink-custom-r
Setup and starter script for the PLINK R plugin.
https://github.com/kenhanscombe/plink-custom-r
genetic-analysis plink r statistical-analysis
Last synced: about 2 months ago
JSON representation
Setup and starter script for the PLINK R plugin.
- Host: GitHub
- URL: https://github.com/kenhanscombe/plink-custom-r
- Owner: kenhanscombe
- Created: 2017-11-14T10:22:08.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2017-11-14T15:55:24.000Z (about 7 years ago)
- Last Synced: 2023-10-19T22:29:41.127Z (about 1 year ago)
- Topics: genetic-analysis, plink, r, statistical-analysis
- Language: R
- Homepage:
- Size: 238 KB
- Stars: 6
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.html
Awesome Lists containing this project
README
code{white-space: pre;}
pre:not([class]) {
background-color: white;
}if (window.hljs && document.readyState && document.readyState === "complete") {
window.setTimeout(function() {
hljs.initHighlighting();
}, 0);
}h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
button.code-folding-btn:focus {
outline: none;
}$(document).ready(function () {
window.buildTabsets("TOC");
});Custom analysis with PLINK R plugin
_*If like me, you thought this would be great but hadn’t actually got around to figuring out how to use it, here is a script to play with and some setup instructions._
It is possible to call R from PLINK. This facility allows you to keep genotype and phenotype data in PLINK binary format and perform a custom analysis. Below is an example of how this facility can be used to retrieve model fit statistics.
More information for PLINK’s R Plugin functions is available in the 1.07 and 1.9 documentation, including details for changing port, host, socket.
Getting started
First, you will need to install the development version of PLINK, and the latest version of R. Open R and install relevant packages.
Rserve
is required;broom
and a couple oftidyverse
packages are needed for the specific example below. Make a note of theRserve
installation location printed byinstall.packages
. You will need to point to it later.To copy the R script, clone this repository.
git clone https://github.com/kenhanscombe/plink-custom-r.git
Retrieve model fit statistics
In an R script (e.g.
plink_custom_analysis.R
), define a custom function. This script defines a pseudo-R-squared for alogistic regression analysis, and uses thebroom
functionsglance
andtidy
to collect fit statistics. (Note: Before changing anything to suit your needs, see the Details section at the end.)Rplink <- function(PHENO, GENO, CLUSTER, COVAR) {
library(tidyverse)
library(broom)pseudo_rsq <- function(model){
dev <- model$deviance
null_dev <- model$null.deviance
model_n <- length(model$fitted.values)
r2_cox_snell <- 1 - exp(-(null_dev - dev) / model_n)
r2_nagelkerke <- r2_cox_snell / (1 - (exp(-(null_dev / model_n))))
r2_nagelkerke
}func <- function(snp) {
m <- glm(PHENO == 2 ~ COVAR + snp, family = "binomial")
rsq <- pseudo_rsq(m)
glance_m <- glance(m) %>% unlist(.[1, ])
tidy_m <- tidy(m) %>% select(-term) %>% tail(n = 1) %>% unlist()
summary_m <- c(tidy_m, glance_m, rsq)
c(length(summary_m), summary_m)
}apply(GENO, 2, func)
}
To run the custom analysis, first start
Rserve
(supply the full path toR CMD
). All data input and filtering flags to PLINK remain the same. Simply add--R [R script filename]
to the PLINK call. The results of the custom analysis are written toplink.auto.R
by default (As usual, you can change the file stemplink
with--out
).R CMD /full/path/to/Rserve
plink \
--bfile {prefix} \
--pheno [filename] \
--covar [filename] \
(other optional filters ...)
--logistic \
--R custom_plink_analysis.RNB. In the above example we’re collecting model fit statistics from a logistic regresion (using the excellent package
broom
).--logistic
is an optional sanity check. Compareplink.assoc.logistic
toplink.auto.R
for effect size, signed statistic, and p-value. (Adding a header to theplink --R
output helps. See Output section below)
Output
For each SNP in your analysis (i.e., each row in the output
plink.auto.R
), PLINK combines the vector of outputs v, with the 4 values for CHR, SNP, BP, and A1. The R read command below adds a header to the custom output. You could of course do this in a bash one-liner, but if you’re going to use in R to visualize your association results and model fit statistics, you can add column names on reading in the data.library(tidyverse)
# These col_names correspond to the custom analysis above.
custom_plink_result <- read_table2(
"plink.auto.R",
col_names = c("chr", "snp", "bp", "a1", "estimate", "std_error", "statistic",
"p_value", "null_deviance", "df_null", "logLik", "aic", "bic", "deviance",
"df_residual", "pseudo_rsq"),
cols(
chr = col_integer(),
snp = col_character(),
bp = col_integer(),
a1 = col_character(),
estimate = col_double(),
std_error = col_double(),
statistic = col_double(),
p_value = col_double(),
null_deviance = col_double(),
df_null = col_double(),
logLik = col_double(),
aic = col_double(),
bic = col_double(),
deviance = col_double(),
df_residual = col_double(),
pseudo_rsq = col_double()
)
)
Multi-SNP model
If you want to inspect the overall model fit of a multi-SNP model, or compare the relative fit of multiple genetic variants (e.g. your 3 favourite SNPs), against a null model (e.g. 10 PCs), you cannot include the SNPs with the
--condition
flag. PLINK’s--R
always runs the analysis defined inRplink
. There are a couple of workarounds. One solution is to add the SNPs to the covariate file. First, convert the 3 SNPs to a 0/1/2 count of the reference allele with--recode A
. The recoded SNPs appear in the last 3 columns ofplink.raw
. Add these 3 columns to the covariate file. Next, edit the function call in Rplink to not include snps (i.e., delete+ snp
) then run your custom analysis once with--covar-number 1-10
(null), and a second time with--covar-number 1-13
. Compare the 2 models.
Details (summarised from PLINK 1.07 and 1.9 documentation)
For a sample of size n, genotyped at l genetic variants, including c covariates, all genotypes, phenotypes, covariates and cluster membership are accessible within the custom R script as:
PHENO A vector of phenotypes of length n.
GENO An n x l matrix of genotypes.
CLUSTER A vector of cluster membership codes of length n.
COVAR An n x c matrix of covariates.
The R script defines a function
Rplink
, with obligatory header, and return value, as follows,Rplink <- function(PHENO, GENO, CLUSTER, COVAR) {
# A function f is applied to the columns of GENO (i.e. to each genetic variant) and
# must return a numeric vector v, combined with its length.
f <- function(s) {
# Function body
c(length(v), v)
}
apply(GENO, 2, f)
}// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();