Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/rdevito/MSFA

Last synced: 28 days ago
JSON representation

Host: GitHub
URL: https://github.com/rdevito/MSFA
Owner: rdevito
Created: 2020-03-14T14:22:52.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2022-04-28T11:42:09.000Z (about 2 years ago)
Last Synced: 2024-02-25T06:33:36.021Z (4 months ago)
Language: R
Size: 60.6 MB
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Lists

awesome-multi-omics - MSFA - De Vito - multi-study factor analysis: same features, different samples - [paper](https://arxiv.org/abs/1611.06350) (Software packages and methods / Multi-study correlation or factor analysis)

README

        # MSFA

author: Roberta De Vito, Ruggero Bellio

Fits the Multi-Study Factor Analysis model  via  [ECM algorithm](#1-fitting-a-msfa-model-via-the-ecm-algorithm) and via  [a Bayesian approach](#2-bayesian-analysis-of-a-msfa-model).

## 1 Fitting a MSFA model via the ECM Algorithm

The following example illustrates how to fit a MSFA model via the ECM Algorithm, 

using a data set available in the Bioconductor repository (www.bioconductor.org). 

## Getting the data

Some pre-processing is required to get the data into a form suitable for the

analysis. This was already done, and the resulting data frame is saved into the

`data_immune` object. The commands that were used to form it are included

in the help file for the data object.

```{r help2, echo = TRUE, results = TRUE, tidy = TRUE}

library(MSFA)

data(data_immune)

help(data_immune)

```

## Obtaining suitable starting values for model parameters

Then we get suitable starting values for model parameters, selecting K=3 common

factors and (3, 4) study-specific factors for the two studies, respectively.

```{r, starting values, messages = FALSE}

start_value <- start_msfa(X_s = data_immune, k = 3, j_s = c(3, 4))

```

## Fitting the model via ECM

Now everything is in place for estimating the model parameters via the ECM algorithm

```{r get estimate, results = FALSE}

mle <-  ecm_msfa(data_immune, start_value, trace = FALSE)

```

The estimated matrix of common loadings can be represented by a suitable heatmap:

```{r heatmap, fig.show = 'hold', fig.width = 7.5, fig.height = 6.5, message = FALSE}

library(gplots)

heatmap.2(mle$Phi,dendrogram='none', Rowv=FALSE, Colv=FALSE,trace='none', density.info="none", col=heat.colors(256))

```

---

 ## 2 Bayesian Analysis of a MSFA model

The following example illustrates a Bayesian analysis of a MSFA model.

Although the methodology has been developed targeting the $p>n$ case, for the sake

of simplicity we illustrate the analysis of the same data set employed for

maximum likelihood estimation. The data set is 

 available in the Bioconductor repository (www.bioconductor.org). 

### Getting the data

Some pre-processing is required to get the data into a form suitable for the

analysis. This was already done, and the resulting data frame is saved into the

`data_immune` object. The commands that were used to form it are included

in the help file for the data object.

```{r help3, echo = TRUE, results = TRUE, tidy = TRUE}

library(MSFA)

data(data_immune)

help(data_immune)

```

## Sampling from the posterior distribution 

We fist estimate a model with a somewhat large dimension of the various loading matrices,

so we set a dimension 10 for both the common factor loadings and the study-specific loadings. In order to get reproducible results, we set the random seed.

```{r posterior, messages = FALSE}

set.seed(1971)

out10_1010 <- sp_msfa(data_immune,  k = 10,  j_s = c(10, 10), trace = FALSE)

```

We take as the estimated $\Sigma_\Phi$   the posterior median

```{r Phi }

p <- ncol(data_immune[[1]])

nrun <- dim(out10_1010$Phi)[3]

SigmaPhi <-  SigmaLambda1 <- SigmaLambda2 <- array(0, dim=c(p, p, nrun))

for(j in 1:nrun)

{

  SigmaPhi[,,j] <- tcrossprod(out10_1010$Phi[,,j]) 

  SigmaLambda1[,,j] <- tcrossprod(out10_1010$Lambda[[1]][,,j]) 

  SigmaLambda2[,,j] <- tcrossprod(out10_1010$Lambda[[2]][,,j]) 

}

SigmaPhi <- apply(SigmaPhi, c(1, 2), median)

SigmaLambda1 <- apply(SigmaLambda1, c(1, 2), median)

SigmaLambda2 <- apply(SigmaLambda2, c(1, 2), median)

Phi <- apply(out10_1010$Phi, c(1, 2), median)

```

## Choice of the number of factors

Then we proceed to the choice of the number of common latent factors.

```{r SigmaPhi, fig.width=4.5, fig.height=4.5}

plot(sp_eigen(SigmaPhi), pch = 16)

abline(h = 0.05, col = 2)

```

We note that 5 factors are above the $5\%$ threshold, so we choose $K=5$. We proceed in a similar way for

the two study-specific loading matrices:

```{r SigmaLambda, fig.width=6.5, fig.height=4.5}

par(mfrow=c(1, 2))

plot(sp_eigen(SigmaLambda1), pch=16)

abline(h = 0.05, col = 2)

plot(sp_eigen(SigmaLambda2), pch=16)

abline(h = 0.05, col = 2)

```

We end up with dimensions 3 and 4 for the study-specific factor loadings.

##  OP prostprocessing

We post-process the estimated loading matrix by the OP procedure

```{r OP}

Phi_OP10 <- sp_OP(out10_1010$Phi[,1:5,], itermax = 10)

```

For larger data size, we recommend to reduce the output level in the call to

```sp_msfa```.