Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/GEMINI-multimorbidity/partialLDSC

R package to estimate partial genetic correlations
https://github.com/GEMINI-multimorbidity/partialLDSC

Last synced: about 2 months ago
JSON representation

R package to estimate partial genetic correlations

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
fig.path = "doc/Figures/README-",
out.width = "100%")
# for tibbles...
options(pillar.neg=F, # do no print neg number in red
pillar.subtle=F, # turn off highlighting of significant digits
tibble.width = 170) # default=95, increase it to make it readable

# # automatically updates manual
devtools::build_manual()
system("mv partialLDSC_0.0.1.0.pdf doc/partialLDSC-manual.pdf")
library(partialLDSC)

A = readRDS("inst/Data/A.RDS")
```

# partialLDSC

:information\_source: `partialLDSC` is still under active development.

[![](https://img.shields.io/badge/version-0.1.1-informational.svg)](https://github.com/GEMINI-multimorbidity/partialLDSC)
[![](https://img.shields.io/github/last-commit/GEMINI-multimorbidity/partialLDSC.svg)](https://github.com/GEMINI-multimorbidity/partialLDSC/commits/master)
[![](https://img.shields.io/badge/lifecycle-experimental-orange)](https://www.tidyverse.org/lifecycle/#experimental)
[![DOI](https://zenodo.org/badge/716022171.svg)](https://zenodo.org/doi/10.5281/zenodo.12721532)

## Overview
[//]:*******

`partialLDSC` is an R-package to estimate partial genetic correlations from GWAS summary statistics, and compare them to their unadjusted counterparts, to quantify the contribution of a given confounder in explaining genetic similarity between conditions. The partial genetic correlations between two conditions correspond to their genetic correlation, holding the genetic effects of a potential confounder constant. Differences between unadjusted and partial estimates are not necessarily due to a causal effect of the potential confounder on both conditions and further (causal inference) analyses might be needed to better describe the relationship between the conditions and the potential confounder.

It relies on cross-trait LD-score regression (LDSC), as first described by [Bulik-Sullivan, B. et al. - “An atlas of genetic correlations across human diseases and traits.”](https://pubmed.ncbi.nlm.nih.gov/26414676/).

Our implementation of LDSC is based on the one from [`GenomicSEM`](https://github.com/GenomicSEM/GenomicSEM/). Moreover, the pre-processing of the GWAS summary statistics prior to analysis should be done using the `munge` function they provide.

There are two main functions available:

- **`partial_ldsc()`**
main function to estimate unadjusted and partial genetic correlations (as well as heritabilities, on the observed scale only), and compare them to each other to assess if adjusting for the potential confounder's genetic significantly affect the pairwise genetic correlation estimates.

- **`forest_plot()`**
main function to visualise the results.

More details about their usage can be found in the [manual](doc/partialLDSC-manual.pdf).

For details on the method see our [preprint](https://doi.org/10.1101/2024.07.10.24309772). If you use the package `partialLDSC` please cite:

> Mounier _et al._ (2024) Genetics identifies obesity as a shared risk factor for co-occurring multiple long-term conditions. medRxiv [https://doi.org/10.1101/2024.07.10.24309772](https://doi.org/10.1101/2024.07.10.24309772)

## Installation
[//]:*******

You can install the current version of `partialLDSC` with:
```{r install-package, echo=TRUE, eval=F, message=FALSE, results='hide'}
# Directly install the package from github
# install.packages("remotes")
remotes::install_github("GEMINI-multimorbidity/partialLDSC")
library(partialLDSC)
```

## Usage
[//]:*******

To run the analysis with `partialLDSC` different inputs are needed:

#### 1. The munged GWAS summary statistics (`conditions` & `confounder`):

More information about how to munge the summary statistics can be found in the [`GenomicSEM`](https://github.com/GenomicSEM/GenomicSEM/) R-package documentaion, [here](https://github.com/GenomicSEM/GenomicSEM/wiki/3.-Models-without-Individual-SNP-effects#step-1-munge-the-summary-statistics).

#### 2. The input files for LDSC (`ld`):

LD scores are needed, these are the same as the ones needed by the [`GenomicSEM`](https://github.com/GenomicSEM/GenomicSEM/) R-package and can be directly downloaded from the link they provide.

> Expects LD scores formated as required by the original LD score regression software. Weights for the european population can be obtained by downloading the eur_w_ld_chr folder in the link below (Note that these are the same weights provided by the original developers of LDSC): https://utexas.box.com/s/vkd36n197m8klbaio3yzoxsee6sxo11v

### Analysis
[//]:-------------------------------

Before running the examples, please make sure to have downloaded the ld-scores files. You may also need to modify the `ld` parameters to indicate the correct path. Note that when running the analysis with your own GWAS summary statistics, you will first need to properly munge them.

- **Example A**

```{r exampleA, eval=F}
# Using GEMINI GWAS summary statistics for three conditions
# (osteoarthitis: OA, type 2 diabetes: T2D, benign hyperplasia of prostate: BPH)
# + GIANT GWAS summary statistics for the confounder (BMI)
# (1,150,000 SNPs - stored in gzipped files)
OA_file <- system.file("Data/", "OA_GEMINI.sumstats.gz", package="partialLDSC")
T2D_file <- system.file("Data/", "diabetes_type_2_GEMINI.sumstats.gz", package="partialLDSC")
BPH_file <- system.file("Data/", "BPH_GEMINI.sumstats.gz", package="partialLDSC")
BMI_file <- system.file("Data/", "BMI_Yengo_2018.txt.sumstats.gz", package="partialLDSC")

# launch analysis (using default number of blocks)
A = partial_ldsc(conditions = c(OA_file, T2D_file, BPH_file),
confounder = BMI_file,
condition.names = c("OA", "T2D", "BPH"),
confounder.name = "BMI",
ld = "~/eur_w_ld_chr",
log.name = "Example_A")

```

Show log

```{r logAdisplay,echo=FALSE, eval=TRUE}
logA = readr::read_lines(system.file("data/", "Example_A_ldsc.log", package="partialLDSC"))
cat(logA, sep="\n")
```


### Results
[//]:-------------------------------

**`ldsc_partial()`** returns a named list containing the following results:

- `res_diff` (pairwise results)

`condition.1` : first condition in the pair,
`condition.2` : second condition in the pair,
`rg` : unadjusted genetic correlation between the two conditions,
`rg.SE` : standard error of the unadjusted genetic correlation between the two conditions,
`partial_rg` : partial genetic correlation between the two conditions,
`partial_rg.SE` : standard error of the partial genetic correlation between the two conditions,
`rg_cov` : covariance between the unadjusted and the partial correlation estimates for the pair,
`diff.T` : test statistic used to test for the difference between the unadjusted and the partial correlation estimates for the pair,
`diff.P` : p-value corresponding to the test statistic used to test the difference between the unadjusted and the partial correlation estimates for the pair.


- `S` : estimated genetic covariance matrix for all conditions + confounder.

- `V` : variance covariance matrix of the parameter estimates in `S`.

- `S_Stand` : estimated genetic correlation matrix for all conditions + confounder.

- `V_Stand` : variance covariance matrix of the parameter estimates in `S_Stand`.

- `partial.S` : estimated partial genetic covariance matrix for all conditions.

- `partial.V` : variance covariance matrix of the parameter estimates in `partial.S`.

- `partial.S_Stand` : estimated partial genetic correlation matrix for all conditions.

- `partial.V_Stand` : variance covariance matrix of the parameter estimates in `partial.S_Stand`.

- `I` : matrix containing the cross-trait intercepts.

##### Aditionnaly, log and results files are created in the current working directory :
- **_ldsc.log** - log file
- **_difference.tsv** - results file

- **Example A**

```{r resultsA}
### structure of the results

str(A)

### matrix of unadjusted genetic correlations

A$S_Stand

### pairwise results
# in this case, we observed a significant difference between the unadjusted and the partial
# genetic correlation estimates only for OA and T2D.
# This does make sense since the genetic correlation between BMI and BPH is very low,
# explaining why adjusting for BMI does not affect the genetic correlation between BPH
# and the other conditions.

A$res_diff

### functions to list conditions / pairs

get_conditions(A)

get_pairs(A)

### forest plot
forest_plot(A)
```

## Runtime
[//]:*******

Example A ~ 0 minute(s) and 55 second(s)

Runtime can be influenced by the number of traits.

Results from analyses performed on a Windows 10 laptop - Processor : Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz 2.21 GHz - Memory : 16.0 GB.

## Citation
[//]:*******

If you use the `partialLDSC` package, please cite: ...

## Contact
...