https://github.com/darkeyes/ipadmixture

A data clustering package based on admixture ratios (Q matrix) of population structure analysis.
https://github.com/darkeyes/ipadmixture

admixture bioinformatics data-clustering-algorithm population-stratification population-structure r

Last synced: 9 months ago
JSON representation

A data clustering package based on admixture ratios (Q matrix) of population structure analysis.

Host: GitHub
URL: https://github.com/darkeyes/ipadmixture
Owner: DarkEyes
License: gpl-3.0
Created: 2020-02-29T13:32:42.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2025-05-07T09:27:23.000Z (about 1 year ago)
Last Synced: 2025-10-12T14:55:04.214Z (9 months ago)
Topics: admixture, bioinformatics, data-clustering-algorithm, population-stratification, population-structure, r
Language: R
Homepage:
Size: 2.67 MB
Stars: 5
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE

Awesome Lists containing this project

README

          ipADMIXTURE: Iterative Pruning Population Admixture Inference Framework

==========================================================

[![minimal R version](https://img.shields.io/badge/R%3E%3D-3.5.0-6666ff.svg)](https://cran.r-project.org/)

[![CRAN Status Badge](https://www.r-pkg.org/badges/version-last-release/ipADMIXTURE)](https://cran.r-project.org/package=ipADMIXTURE)

[![Download](https://cranlogs.r-pkg.org/badges/grand-total/ipADMIXTURE)](https://cran.r-project.org/package=ipADMIXTURE)

[![bioRxiv](https://img.shields.io/badge/bioRxiv-2020.03.21.001206-B31B1B)](https://doi.org/10.1101/2020.03.21.001206)

[![License](https://img.shields.io/badge/License-GPL%203-orange.svg)](https://spdx.org/licenses/GPL-3.0-only.html)

 A data clustering package based on admixture ratios (Q matrix) of population structure.

 

 The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks. 

 

The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. 

By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K* that makes majority of members of two clusters are in the different clusters. This K* reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K* clusters based on maximum admixture ratio of individuals.

 

Installation

------------

You can install our package from CRAN.

```r

install.packages("ipADMIXTURE")

```

For the newest version on github, please call the following command in R terminal.

``` r

remotes::install_github("DarkEyes/ipADMIXTURE")

```

This requires a user to install the "remotes" package before installing ipADMIXTURE.

EXAMPLE

----------------------------------------------------------------------------------

In this example, we have data set of human 27 population data published by Xing, J., et al. (2009). The dataset consists of 544 individuals from 27 populations. The Q matrices from this data are provided in this package. The following steps are the simple way to use our package.

Step1: running the  ipADMIXTURE using Human 27 population dataset where the number of ancestors K =12. 

```{r}

library(ipADMIXTURE)

# # running area: ipADMIXTURE::human27pop_Qmat[[i]] is a Q matrix with K=i+1

h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)

```

Step2: printing all cluster information in text mode.

```{r}

ipADMIXTURE::printClustersFromLabels(h27pop_obj,human27pop_labels)

```

Then, the text looks like this

```{r}

[1] "Overall labels"

[1] "==============="

[1] "Alur(10)Hema(15)Pygmy(25)Brahmin(25)Utah_N._European(25)Cambodian(5)Chinese(10)Tamil_LC(13)Irula(24)JPN2(13)Madiga(10)Mala(11)CEU(60)YRI(60)CHB(45)JPT(45)Luhya(24)Tuscan(25)Kung(13)Pedi(10)Sotho/Tswana(8)Stalskoe(5)Iban(25)TBrahmin(14)Urkarah(18)VN(7)Nguni(9)"

[1] "==============="

[1] "ID1, md0.05, N25"

[1] "Pygmy(25/25)"

[1] "==============="

[1] "ID2, md0.13, N56"

[1] "JPN2(12/13)JPT(44/45)"

[1] "==============="

[1] "ID3, md0.00, N12"

[1] "Kung(12/13)"

[1] "==============="

[1] "ID4, md0.00, N25"

[1] "Iban(25/25)"

[1] "==============="

[1] "ID5, md0.00, N69"

[1] "Cambodian(5/5)Chinese(10/10)JPN2(1/13)CHB(45/45)JPT(1/45)VN(7/7)"

[1] "==============="

[1] "ID6, md0.06, N25"

[1] "Utah_N._European(1/25)Tuscan(24/25)"

[1] "==============="

[1] "ID7, md0.09, N85"

[1] "Utah_N._European(24/25)CEU(60/60)Tuscan(1/25)"

[1] "==============="

[1] "ID8, md0.00, N17"

[1] "Urkarah(17/18)"

[1] "==============="

[1] "ID9, md0.00, N6"

[1] "Stalskoe(5/5)Urkarah(1/18)"

[1] "==============="

[1] "ID10, md0.00, N4"

[1] "Irula(4/24)"

[1] "==============="

[1] "ID11, md0.00, N10"

[1] "Irula(10/24)"

[1] "==============="

[1] "ID12, md0.00, N9"

[1] "Irula(9/24)"

[1] "==============="

[1] "ID13, md0.00, N33"

[1] "Tamil_LC(13/13)Madiga(9/10)Mala(11/11)"

[1] "==============="

[1] "ID14, md0.08, N41"

[1] "Brahmin(25/25)Irula(1/24)Madiga(1/10)TBrahmin(14/14)"

[1] "==============="

[1] "ID15, md0.00, N4"

[1] "Pedi(2/10)Sotho/Tswana(2/8)"

[1] "==============="

[1] "ID16, md0.00, N20"

[1] "Pedi(5/10)Sotho/Tswana(6/8)Nguni(9/9)"

[1] "==============="

[1] "ID17, md0.00, N4"

[1] "Kung(1/13)Pedi(3/10)"

[1] "==============="

[1] "ID18, md0.04, N60"

[1] "YRI(60/60)"

[1] "==============="

[1] "ID19, md0.00, N4"

[1] "Hema(2/15)Luhya(2/24)"

[1] "==============="

[1] "ID20, md0.00, N2"

[1] "Luhya(2/24)"

[1] "==============="

[1] "ID21, md0.07, N20"

[1] "Luhya(20/24)"

[1] "==============="

[1] "ID22, md0.12, N23"

[1] "Alur(10/10)Hema(13/15)"

```

For any cluster, it is separated from other cluster by "===============". The first line of cluster details is "IDx, md0.xx, Nx" and the second line is a detail of populations from the ground truth. 

For example,

[1] "ID19, md0.00, N4"

[1] "Hema(2/15)Luhya(2/24)".

This is a cluster ID19 that has a maximum of manitude-difference of admixture ratios (md) as 0.00 and there are 4 individuals in this cluster. For a second line, there are 2 individuals from Hema population where the total number of Hema members is 15. There are also 2 individuals out of 24 from Luhya population.

Step3: plotting admixture ratios and clustering assignment.

```{r}

ipADMIXTURE::plotAdmixClusters(h27pop_obj)

```



Step4: plotting clustering information in treemap plot

```{r}

ipADMIXTURE::plotClusterLeaves(h27pop_obj)

```



Step5: Inferring phylogenetic tree of clusters based on a list of Q matrices that varies K using neighbor-joining (NJ) method. 

```{r}

out<-ipADMIXTURE::getPhyloTree(human27pop_Qmat,h27pop_obj$indexClsVec)

plot(out$tree,type = "unrooted")

```



The leave nodes are cluster IDs. 

Creating Q matrix from .geno file using R

---------------------------------------------------

There are two well-known software products for getting Q matrix: ADMIXTURE  and STRUCTURE. However, if you want to have everything in R, then here's the solution.

We can use LEA package to convert .geno file into Q matrix. If you never install bioconductor, then you should run the following code.

```{r}

if (!requireNamespace("BiocManager", quietly = TRUE))

    install.packages("BiocManager")

```

You can install LEA package by the BiocManager below.

```{r}

BiocManager::install("LEA")

```

Suppose we have "yourfile.geno" and we want to get the Q matrix with 4 ancestors, then we can run the following code.

```{r}

library(LEA)

K=4

obj.snmf = LEA::snmf(input.file="yourfile.geno", K = K, project = "new")

Qmat = LEA::Q(obj.snmf, K = K)

```

Citation

----------------------------------------------------------------------------------

- Chainarong Amornbunchornvej, Pongsakorn Wangkumhang, and Sissades Tongsima (2020). ipADMIXTURE: R package for inferring sub-population clusters based on genetic admixture.

bioRxiv 2020.03.21.001206; doi: https://doi.org/10.1101/2020.03.21.001206

Contact

----------------------------------------------------------------------------------

- Developer: C. Amornbunchornvej
https://orcid.org/0000-0003-3131-0370

- Strategic Analytics Networks with Machine Learning and AI (SAI), NECTEC, Thailand

- Homepage: Link

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/darkeyes/ipadmixture

Awesome Lists containing this project

README