https://github.com/borgwardtlab/lmm-lasso
An implementation of the Lasso model for association mapping and phenotype prediction which corrects for population strucure (Rakitsch et al., Bioinformatics 2013): http://goo.gl/FRmXwI
https://github.com/borgwardtlab/lmm-lasso
Last synced: 11 months ago
JSON representation
An implementation of the Lasso model for association mapping and phenotype prediction which corrects for population strucure (Rakitsch et al., Bioinformatics 2013): http://goo.gl/FRmXwI
- Host: GitHub
- URL: https://github.com/borgwardtlab/lmm-lasso
- Owner: BorgwardtLab
- Created: 2016-10-13T19:11:27.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-10-14T11:42:31.000Z (over 9 years ago)
- Last Synced: 2023-10-20T18:23:27.252Z (over 2 years ago)
- Language: Python
- Homepage:
- Size: 205 KB
- Stars: 6
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LMM-lasso
An implementation of the linear model with sparsity constraints (Lasso model) for association mapping and phenotype prediction, which corrects for population strucure, as described in:
* B. Rakitsch, C. Lippert, O. Stegle, K. Borgwardt (2013)
**A Lasso Multi-Marker Mixed Model for Association Mapping with Population Structure Correction**,
_Bioinformatics_ 29(2):206-1 [link](http://bioinformatics.oxfordjournals.org/content/29/2/206)
## Usage
The folder `code` contains several Python scripts:
* `lmm_lasso.py` : contains the implementation of the proposed model
* `test.py` : an example script
Ina terminal,
```
python code/test.py
```
The plots are saved in a separate folder named `plots`.
## Data
The folder `code` contains a subfolder `data` which contains:
* `poppheno.csv`
* `genotypes.csv`
The data contains 1000 randomly sampled SNPs from the genotype data of A.thaliana from [Atwell et al.(2010)](http://www.nature.com/nature/journal/v465/n7298/full/nature08800.html). For simulating population-driven effects, we used the real phenotype leaf number at flowering time. Univariate analyses as done in the original paper have shown that the phenotype has an excess of associations when population structure is not accounted for. After correction, the p-values are approximately uniformly distributed.