https://github.com/stracquadaniolab/baghera
Bayesian Gene Heritability Analysis from GWAS summary statistics
https://github.com/stracquadaniolab/baghera
bayesian-inference bioinformatics biostatistics gwas pymc3
Last synced: 5 months ago
JSON representation
Bayesian Gene Heritability Analysis from GWAS summary statistics
- Host: GitHub
- URL: https://github.com/stracquadaniolab/baghera
- Owner: stracquadaniolab
- License: mit
- Created: 2019-04-04T16:53:58.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-01-28T19:48:46.000Z (over 3 years ago)
- Last Synced: 2023-03-02T20:36:31.634Z (over 2 years ago)
- Topics: bayesian-inference, bioinformatics, biostatistics, gwas, pymc3
- Language: Python
- Homepage: https://baghera.readthedocs.io
- Size: 240 KB
- Stars: 6
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bayesian Gene Heritability Analysis


The Bayesian Gene Heritability Analysis software (BAGHERA) estimates the contribution
to the heritability of a trait/disease of all the SNPs in the genome (genome-wide heritability)
and those nearby protein-coding genes (gene-level heritability).BAGHERA requires only summary statistics from a Genome-wide Association Study (GWAS),
LD scores calculated from a population matching the ethnicity of the GWAS study and
a gene annotation file in GTF format.## Installation
The easiest and fastest way to install BAGHERA using conda
```
$ conda install -c stracquadaniolab -c bioconda -c conda-forge baghera
```Tutorial
---------------A typical BAGHERA analysis consists of 3 steps:
1. Build a SNP annotation file, where SNPs are annotated to genes and are
assigned an LD score. We used precomputed LD scores
(https://github.com/bulik/ldsc), from the set of variants for the European
population in 1000 Genomes, and protein coding genes as annotated in Gencode
v31 (https://www.gencodegenes.org/releases/current.html). Overlapping genes
within 50Kb were considered together, obtaining a dataset of 15,000
non-overlapping genes. To build your own annotation files, you should run
the following command:```
$ baghera-tool create-files -l -a -s -g
```2. Annotate summary statistics with the SNP annotation built in step 2. We used summary statistics available at http://www.nealelab.is/uk-biobank, followed by the command below:
```
$ baghera-tool generate-snp-file -s -i -o -a
```3. Run the regression.
```
$ baghera-tool gene-heritability --sweeps --burnin --n-chains --n-cores -m
```## Example
Running BAGHERA on the UK Biobank summary statistics for breast cancer, using
EUR LD scores and the Gencode annotation.
```
$ baghera-tool create-files -l data/eur_w_ld_chr/ -a data/gencode.v31lift37.basic.annotation.gtf -s data/ld_annotated_gencode_v31.csv -g data/genes_gencode_v31.csv
$ baghera-tool generate-snp-file -s data/C50.gwas.imputed_v3.both_sexes.tsv -i position_ukbb -o data/c50.snps.csv -a data/ld_annotated_gencode_v31.csv
$ baghera-tool gene-heritability data/c50.snps.csv data/results_normal_c50.csv data/summary_normal_c50.csv data/log_normal_c50.txt --sweeps 10000 --burnin 2500 --n-chains 4 --n-cores 4 -m normal
```## Workflow
Alongside BAGHERA, we are providing a Snakemake workflow https://github.com/stracquadaniolab/workflow-baghera, including sample data to test our method.
## Authors
- Viola Fanfani (v.fanfani@sms.ed.ac.uk): mantainer.
- Giovanni Stracquadanio (giovanni.stracquadanio@ed.ac.uk)## Citation
The landscape of the heritable cancer genome
Viola Fanfani, Luca Citi, Adrian L Harris, Francesco Pezzella and Giovanni Stracquadanio
Cancer Res March 17 2021 DOI: 10.1158/0008-5472.CAN-20-3348```
@article {Fanfani2021,
author = {Fanfani, Viola and Citi, Luca and Harris, Adrian L and Pezzella, Francesco and Stracquadanio, Giovanni},
title = {The landscape of the heritable cancer genome},
elocation-id = {canres.3348.2020},
year = {2021},
doi = {10.1158/0008-5472.CAN-20-3348},
publisher = {American Association for Cancer Research},
issn = {0008-5472},
journal = {Cancer Research}
}
```## Issues
We just released a major upgrade of the code, please report any issue.