An open API service indexing awesome lists of open source software.

https://github.com/stracquadaniolab/baghera

Bayesian Gene Heritability Analysis from GWAS summary statistics
https://github.com/stracquadaniolab/baghera

bayesian-inference bioinformatics biostatistics gwas pymc3

Last synced: 5 months ago
JSON representation

Bayesian Gene Heritability Analysis from GWAS summary statistics

Awesome Lists containing this project

README

          

# Bayesian Gene Heritability Analysis

![alt-ci](https://github.com/stracquadaniolab/baghera/workflows/Release/badge.svg)
![alt-version](https://img.shields.io/github/v/tag/stracquadaniolab/baghera)
![alt-version](https://anaconda.org/stracquadaniolab/baghera/badges/version.svg)

The Bayesian Gene Heritability Analysis software (BAGHERA) estimates the contribution
to the heritability of a trait/disease of all the SNPs in the genome (genome-wide heritability)
and those nearby protein-coding genes (gene-level heritability).

BAGHERA requires only summary statistics from a Genome-wide Association Study (GWAS),
LD scores calculated from a population matching the ethnicity of the GWAS study and
a gene annotation file in GTF format.

## Installation

The easiest and fastest way to install BAGHERA using conda

```
$ conda install -c stracquadaniolab -c bioconda -c conda-forge baghera
```

Tutorial
---------------

A typical BAGHERA analysis consists of 3 steps:

1. Build a SNP annotation file, where SNPs are annotated to genes and are
assigned an LD score. We used precomputed LD scores
(https://github.com/bulik/ldsc), from the set of variants for the European
population in 1000 Genomes, and protein coding genes as annotated in Gencode
v31 (https://www.gencodegenes.org/releases/current.html). Overlapping genes
within 50Kb were considered together, obtaining a dataset of 15,000
non-overlapping genes. To build your own annotation files, you should run
the following command:

```
$ baghera-tool create-files -l -a -s -g
```

2. Annotate summary statistics with the SNP annotation built in step 2. We used summary statistics available at http://www.nealelab.is/uk-biobank, followed by the command below:

```
$ baghera-tool generate-snp-file -s -i -o -a
```

3. Run the regression.

```
$ baghera-tool gene-heritability --sweeps --burnin --n-chains --n-cores -m
```

## Example

Running BAGHERA on the UK Biobank summary statistics for breast cancer, using
EUR LD scores and the Gencode annotation.
```
$ baghera-tool create-files -l data/eur_w_ld_chr/ -a data/gencode.v31lift37.basic.annotation.gtf -s data/ld_annotated_gencode_v31.csv -g data/genes_gencode_v31.csv
$ baghera-tool generate-snp-file -s data/C50.gwas.imputed_v3.both_sexes.tsv -i position_ukbb -o data/c50.snps.csv -a data/ld_annotated_gencode_v31.csv
$ baghera-tool gene-heritability data/c50.snps.csv data/results_normal_c50.csv data/summary_normal_c50.csv data/log_normal_c50.txt --sweeps 10000 --burnin 2500 --n-chains 4 --n-cores 4 -m normal
```

## Workflow

Alongside BAGHERA, we are providing a Snakemake workflow https://github.com/stracquadaniolab/workflow-baghera, including sample data to test our method.

## Authors

- Viola Fanfani (v.fanfani@sms.ed.ac.uk): mantainer.
- Giovanni Stracquadanio (giovanni.stracquadanio@ed.ac.uk)

## Citation

The landscape of the heritable cancer genome
Viola Fanfani, Luca Citi, Adrian L Harris, Francesco Pezzella and Giovanni Stracquadanio
Cancer Res March 17 2021 DOI: 10.1158/0008-5472.CAN-20-3348

```
@article {Fanfani2021,
author = {Fanfani, Viola and Citi, Luca and Harris, Adrian L and Pezzella, Francesco and Stracquadanio, Giovanni},
title = {The landscape of the heritable cancer genome},
elocation-id = {canres.3348.2020},
year = {2021},
doi = {10.1158/0008-5472.CAN-20-3348},
publisher = {American Association for Cancer Research},
issn = {0008-5472},
journal = {Cancer Research}
}
```

## Issues
We just released a major upgrade of the code, please report any issue.