Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mancusolab/FactorGo
Software to infer latent pleiotropic components from GWAS summary data
https://github.com/mancusolab/FactorGo
gwas gwas-summary-statistics jax variational-inference
Last synced: 19 days ago
JSON representation
Software to infer latent pleiotropic components from GWAS summary data
- Host: GitHub
- URL: https://github.com/mancusolab/FactorGo
- Owner: mancusolab
- License: mit
- Created: 2022-02-16T23:11:00.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-24T05:35:40.000Z (8 months ago)
- Last Synced: 2024-04-24T12:38:42.260Z (8 months ago)
- Topics: gwas, gwas-summary-statistics, jax, variational-inference
- Language: Python
- Homepage:
- Size: 527 KB
- Stars: 9
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE.txt
- Authors: AUTHORS.rst
Awesome Lists containing this project
- awesome-complex-trait-genetics - FactorGO
- awesome-complex-trait-genetics - FactorGO
README
[![PyPI-Server](https://img.shields.io/pypi/v/traceax.svg)](https://pypi.org/project/traceax/)
[![Github](https://img.shields.io/github/stars/mancusolab/traceax?style=social)](https://github.com/mancusolab/traceax)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)# FactorGo
``FactorGo`` is a scalable variational factor analysis model that learns pleiotropic factors using GWAS summary statistics!
We present **Factor** analysis model in **G**enetic ass**O**ciation (FactorGo) to learn latent
pleiotropic factors using GWAS summary statistics. Our model is implemented using `Just-in-time` (JIT)
via [JAX](https://github.com/google/jax) in python, which generates and compiles heavily optimized
C++ code in real time and operates seamlessly on CPU, GPU or TPU. FactorGo is a command line tool and
please see example below and full documentation.For pubished paper, please see:
Zhang, Z., Jung, J., Kim, A., Suboc, N., Gazal, S., and Mancuso, N. (2023). A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics. _Am. J. Hum. Genet._ 110, 1863–1874.
(https://www.cell.com/ajhg/abstract/S0002-9297(23)00353-1)We are currently working on more detailed documentations. Feel free to contact me ([email protected]) if you need help on running our tool and further analysis. I am happy to schedule zoom call if needed.
[**Installation**](#installation)
| [**Example**](#example)
| [**Notes**](#notes)
| [**Support**](#support)
| [**Other Software**](#other-software)## FactorGo model
FactorGo assumes the true genetic effect can be decomposed into latent pleiotropic factors.
Briefly, we model test statistics at $p$ independent variants from the ith GWAS $Z_i \\approx \\sqrt{N}_i \\hat{\\beta}_i$ as a
linear combination of $k$ shared latent variant loadings $L \\in R^{p \\times k}$ with trait-specific factor scores $f_i \\in R^{k \\times 1}$ as$$Z_i = \\sqrt{N}_i \\beta_i + \\epsilon_i = \\sqrt{N}_i (L f_i + \\mu) + \\epsilon_i $$
where $N_i$ is the sample size for the $i^{th}$ GWAS , $\\mu$ is the intercept and $\\epsilon_i \\sim N(0, \\tau^{-1}I_p)$ reflects residual
heterogeneity in statistical power across studies with precision scalar .
Given $Z = \\{Z_i\\}^n_{i=1}$, and model parameters $L$, $F$, $\\mu$, $\\tau$, we can compute the likelihood as$$\\mathcal{L}(L, F, \\mu, \\tau | Z) = \\prod_i \\mathcal{N}_p ( \\sqrt{N_i} (L f_i + \\mu), \\tau^{-1} I_p)$$
To model our uncertainty in $L$, $F$, $\\mu$, we take a full Bayesian approach in the lower dimension latent space
similar to a Bayesian PCA model [1]_ as,$$\Pr(F) = \\prod_{i=1}^{n} \\mathcal{N}_k (f_i | 0, I_k)$$
$$\Pr(L | \\alpha) = \\prod_{j=1}^{p} \\mathcal{N}_k (l^j | 0, diag(\\alpha^{-1}))$$
$$\Pr(\\mu) = \\mathcal{N}_p (\\mu | 0, \\phi^{-1} I_p)$$
where $\\alpha \\in R^{k \\times 1}_{>0} (\\phi > 0)$ controls the prior precision for variant loadings (intercept). To avoid overfitting,
and “shut off” uninformative factors when $k$ is misspecified, we use automatic relevance determination (ARD) [1]_
and place a prior over $\\alpha$ as$$\Pr(\\alpha | \\alpha_a, \\alpha_b) = \\prod_{q=1}^{k} G(\\alpha_q | \\alpha_a, \\alpha_b)$$
$$\Pr(\\tau | \\tau_a, \\tau_b) = G(\\tau | \\tau_a, \\tau_b)$$
Lastly, we place a prior over the shared residual variance across GWAS studies as $\\tau \\sim G(a , b)$.
We impose broad priors by setting hyperparameters $\\phi = a_k = b_k= a_{\\tau} = b_{\\tau} = 10^{-5}$.## Installation
We recommend first create a conda environment and have `pip` installed.
```bash
# download use http address
git clone https://github.com/mancusolab/FactorGo.git
# or use ssh agent
git clone [email protected]:mancusolab/FactorGo.gitcd factorgo
pip install .
```## Example
For iilustration, we use example data stored in `/example/data`,
including Z score summary statistics file and sample size file.To run ``factorgo`` command line tool, we specify the following input files and flags:
* GWAS Zscore file: n20_p1k.Zscore.tsv.gz
* Sample size file: n20_p1k.SampleN.tsv
* -k 5: estimate 5 latent factors
* --scale: the snp columns of Zscore matrix is center and standardized
* -o: output directory and prefixFor all available flags, please use ``factorgo -h``.
```bash
factorgo \
./example/data/n20_p1k.Zscore.tsv.gz \
./example/data/n20_p1k.SampleN.tsv \
-k 5 \
--scale \
-o ./example/result/demo_test
```The output contains five result files:
1. demo_test.Wm.tsv.gz: posterior mean of loading matrix W (pxk)
2. demo_test.Zm.tsv.gz: posterior mean of factor score Z (nxk)
3. demo_test.Wvar.tsv.gz: posterior variance of loading matrix W (kx1)
4. demo_test.Zvar.tsv.gz: posterior variance of factor score Z (nxk)
5. demo_test.factor.tsv.gz: contains the following three columns| a) factor index (ordered by R2),
| b) posterior mean of ARD precision parameters,
| c) variance explained by each factor (R2)## Notes
The default computation device for ``factorgo`` is CPU. To switch to GPU device, you can specify the platform (cpu/gpu/tpu) using the flag `-p gpu`
for example:```bash
factorgo \
./example/data/n20_p1k.Zscore.tsv.gz \
./example/data/n20_p1k.SampleN.tsv \
-k 5 \
--scale \
-p gpu \ # use gpu device
-o ./example/result/demo_test
`````factorgo`` uses [JAX](https://github.com/google/jax) with [Just In Time](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html) compilation to achieve high-speed computation.
However, there are some [issues](https://github.com/google/jax/issues/5501) for JAX with Mac M1 chip.
To solve this, users need to initiate conda using [miniforge](https://github.com/conda-forge/miniforge), and then install ``factorgo`` using ``pip`` in the desired environment.## References
[1] Bishop, C.M. (1999). Variational principal components. 509–514.
## Support
Please report any bugs or feature requests in the [Issue Tracker](https://github.com/mancusolab/FactorGo/issues>).
If you have any questions or comments please contact [email protected] and/or [email protected].## Other Softwares
Feel free to use other software developed by [Mancuso
Lab](https://www.mancusolab.com/):- [SuShiE](https://github.com/mancusolab/sushie): a Bayesian
fine-mapping framework for molecular QTL data across multiple
ancestries.
- [MA-FOCUS](https://github.com/mancusolab/ma-focus): a Bayesian
fine-mapping framework using
[TWAS](https://www.nature.com/articles/ng.3506) statistics across
multiple ancestries to identify the causal genes for complex traits.
- [SuSiE-PCA](https://github.com/mancusolab/susiepca): a scalable
Bayesian variable selection technique for sparse principal component
analysis
- [twas_sim](https://github.com/mancusolab/twas_sim): a Python
software to simulate [TWAS](https://www.nature.com/articles/ng.3506)
statistics.
- [HAMSTA](https://github.com/tszfungc/hamsta): a Python software to
estimate heritability explained by local ancestry data from
admixture mapping summary statistics.## Note
This project has been set up using PyScaffold 4.1.1. For details and usage
information on PyScaffold see https://pyscaffold.org/.