Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lcpilling/gwasRtools

Some useful tools for processing GWAS output
https://github.com/lcpilling/gwasRtools

Last synced: 2 months ago
JSON representation

Some useful tools for processing GWAS output

Host: GitHub
URL: https://github.com/lcpilling/gwasRtools
Owner: lcpilling
License: gpl-3.0
Created: 2023-06-19T15:49:46.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-09T20:06:46.000Z (7 months ago)
Last Synced: 2024-07-10T00:36:12.614Z (7 months ago)
Language: R
Homepage: https://lcpilling.github.io/gwasRtools
Size: 6.07 MB
Stars: 12
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

awesome-complex-trait-genetics - gwasRtools

README

        

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%",

  fig.width = 6, height = 4, dpi = 150

)

set.seed(1234)

```



# gwasRtools

Some useful R functions for processing GWAS output

[![](https://img.shields.io/badge/version-0.1.3-informational.svg)](https://github.com/lcpilling/gwasRtools)

[![](https://img.shields.io/github/last-commit/lcpilling/gwasRtools.svg)](https://github.com/lcpilling/gwasRtools/commits/master)

[![](https://img.shields.io/badge/lifecycle-experimental-orange)](https://www.tidyverse.org/lifecycle/#experimental)

[![DOI](https://zenodo.org/badge/655790727.svg)](https://zenodo.org/badge/latestdoi/655790727)

## List of functions

  - [get_loci()](#get_loci)

  - [get_nearest_gene()](#get_nearest_gene)

  - [lambda_gc()](#lambda_gc)

## Installation

To install `gwasRtools` from [GitHub](https://github.com/) with:

```r

remotes::install_github("lukepilling/gwasRtools")

```

## Example dataset

The package includes a subset of variants from the Graham et al. 2021 GWAS of LDL in 1,320,016 Europeans (GWAS catalog GCST90239658). I will use this throughout.

``` {r}

library(gwasRtools)

head(gwas_example)

```

## get_loci()

Determine loci from a GWAS summary statistics file. Use distance from lead significant SNP to estimate independent loci in GWAS summary stats [default distance = 500kb]. By default, the HLA region is treated is one continuous locus due to the complex LD. Uses -log10(p) derived from BETA/SE so does not need P as input. Example below with default input:

``` {r}

gwas_loci = get_loci(gwas_example)

head(gwas_loci)

gwas_loci |> dplyr::filter(lead==TRUE) |> head()

```

 - Loci are numbered. Variants within a locus (i.e., significant below the `p_threshold` and less than `n_bases` from last significant variant).

 - Lead variant for each locus is highlighted where `lead==TRUE` (i.e., smallest p-value for any variant within a locus)

### Use LD clumping to identify independent SNPs at the same locus 

Setting option `get_ld_indep=TRUE` will use {[ieugwasr](https://github.com/MRCIEU/ieugwasr)} package `ld_clump()` function to run Plink LD clumping. 

Default is to use a local Plink installation (this is faster) with EUR reference panel. But setting option `ld_clump_local` to FALSE will use the online IEU API. See the {ieugwasr} docs for details. Default R2 threshold for LD pruning is 0.01 (modify with `ld_pruning_r2` option). 

``` {r}

gwas_loci = get_loci(gwas_example, get_ld_indep=TRUE)

head(gwas_loci)

gwas_loci |> dplyr::filter(lead==TRUE) |> head()

```

Where before, locus 3 would only have had one lead SNP (based on distance/lowest p-value) `ld_clump()` has identified multiple independent variants in the region.

Note that there are now three `lead` columns:

 - `lead_dist` is the original `lead` column, simply based on distance and p-values

 - `lead_ld` is the direct results from `ld_clump()`

 - The `lead` column combines the two (some SNPs are missing from LD panel so it does not always choose the lowest p-value if only 1 variant identified at a locus).

** Note that `ld_clump()` only considers R^2 when defining independent variants, not D' -- you should perform additional checking/conditional analysis where relevant for `lead` variants in close proximity.

## get_nearest_gene()

Get nearest gene from a set of variants using GENCODE data. Need to provide a data.frame of variant IDs (e.g., rsids), CHR and POS. Default column names are the same as for `get_loci()`. Default max distance from variant to gene is 100kb.

``` {r}

gwas_loci = get_nearest_gene(gwas_loci, build=37)

head(gwas_loci)

gwas_loci |> dplyr::filter(lead==TRUE) |> head()

```

 - If `dist` is positive, the variant is intergenic, and this is the distance to the closest gene.

 - If `dist` is negative, the variant is within a gene, and this is the distance to the start of the gene.

 - If `dist` is NA, the variant is not within `n_bases` of a gene in GENCODE.

## lambda_gc()

Estimate inflation of test statistics. Lambda GC compares the median test statistic against the expected median test statistic under the null hypothesis of no association. For well-powered GWAS of traits with a known polygenic inheritance, we expect inflation of lambda GC. For traits with no expected association, we expect lambda GC to be around 1.

``` {r}

lambda_gc(gwas_example$P)

```