Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rdevito/MSFA
https://github.com/rdevito/MSFA
Last synced: 28 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/rdevito/MSFA
- Owner: rdevito
- Created: 2020-03-14T14:22:52.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-04-28T11:42:09.000Z (about 2 years ago)
- Last Synced: 2024-02-25T06:33:36.021Z (4 months ago)
- Language: R
- Size: 60.6 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Lists
- awesome-multi-omics - MSFA - De Vito - multi-study factor analysis: same features, different samples - [paper](https://arxiv.org/abs/1611.06350) (Software packages and methods / Multi-study correlation or factor analysis)
README
# MSFA
author: Roberta De Vito, Ruggero Bellio
Fits the Multi-Study Factor Analysis model via [ECM algorithm](#1-fitting-a-msfa-model-via-the-ecm-algorithm) and via [a Bayesian approach](#2-bayesian-analysis-of-a-msfa-model).
## 1 Fitting a MSFA model via the ECM Algorithm
The following example illustrates how to fit a MSFA model via the ECM Algorithm,
using a data set available in the Bioconductor repository (www.bioconductor.org).## Getting the data
Some pre-processing is required to get the data into a form suitable for the
analysis. This was already done, and the resulting data frame is saved into the
`data_immune` object. The commands that were used to form it are included
in the help file for the data object.```{r help2, echo = TRUE, results = TRUE, tidy = TRUE}
library(MSFA)
data(data_immune)
help(data_immune)
```## Obtaining suitable starting values for model parameters
Then we get suitable starting values for model parameters, selecting K=3 common
factors and (3, 4) study-specific factors for the two studies, respectively.```{r, starting values, messages = FALSE}
start_value <- start_msfa(X_s = data_immune, k = 3, j_s = c(3, 4))
```## Fitting the model via ECM
Now everything is in place for estimating the model parameters via the ECM algorithm```{r get estimate, results = FALSE}
mle <- ecm_msfa(data_immune, start_value, trace = FALSE)
```The estimated matrix of common loadings can be represented by a suitable heatmap:
```{r heatmap, fig.show = 'hold', fig.width = 7.5, fig.height = 6.5, message = FALSE}
library(gplots)
heatmap.2(mle$Phi,dendrogram='none', Rowv=FALSE, Colv=FALSE,trace='none', density.info="none", col=heat.colors(256))
```---
## 2 Bayesian Analysis of a MSFA modelThe following example illustrates a Bayesian analysis of a MSFA model.
Although the methodology has been developed targeting the $p>n$ case, for the sake
of simplicity we illustrate the analysis of the same data set employed for
maximum likelihood estimation. The data set is
available in the Bioconductor repository (www.bioconductor.org).### Getting the data
Some pre-processing is required to get the data into a form suitable for the
analysis. This was already done, and the resulting data frame is saved into the
`data_immune` object. The commands that were used to form it are included
in the help file for the data object.```{r help3, echo = TRUE, results = TRUE, tidy = TRUE}
library(MSFA)
data(data_immune)
help(data_immune)
```## Sampling from the posterior distribution
We fist estimate a model with a somewhat large dimension of the various loading matrices,
so we set a dimension 10 for both the common factor loadings and the study-specific loadings. In order to get reproducible results, we set the random seed.```{r posterior, messages = FALSE}
set.seed(1971)
out10_1010 <- sp_msfa(data_immune, k = 10, j_s = c(10, 10), trace = FALSE)
```We take as the estimated $\Sigma_\Phi$ the posterior median
```{r Phi }
p <- ncol(data_immune[[1]])
nrun <- dim(out10_1010$Phi)[3]
SigmaPhi <- SigmaLambda1 <- SigmaLambda2 <- array(0, dim=c(p, p, nrun))
for(j in 1:nrun)
{
SigmaPhi[,,j] <- tcrossprod(out10_1010$Phi[,,j])
SigmaLambda1[,,j] <- tcrossprod(out10_1010$Lambda[[1]][,,j])
SigmaLambda2[,,j] <- tcrossprod(out10_1010$Lambda[[2]][,,j])
}
SigmaPhi <- apply(SigmaPhi, c(1, 2), median)
SigmaLambda1 <- apply(SigmaLambda1, c(1, 2), median)
SigmaLambda2 <- apply(SigmaLambda2, c(1, 2), median)
Phi <- apply(out10_1010$Phi, c(1, 2), median)
```## Choice of the number of factors
Then we proceed to the choice of the number of common latent factors.```{r SigmaPhi, fig.width=4.5, fig.height=4.5}
plot(sp_eigen(SigmaPhi), pch = 16)
abline(h = 0.05, col = 2)
```We note that 5 factors are above the $5\%$ threshold, so we choose $K=5$. We proceed in a similar way for
the two study-specific loading matrices:
```{r SigmaLambda, fig.width=6.5, fig.height=4.5}
par(mfrow=c(1, 2))
plot(sp_eigen(SigmaLambda1), pch=16)
abline(h = 0.05, col = 2)
plot(sp_eigen(SigmaLambda2), pch=16)
abline(h = 0.05, col = 2)
```We end up with dimensions 3 and 4 for the study-specific factor loadings.
## OP prostprocessing
We post-process the estimated loading matrix by the OP procedure
```{r OP}
Phi_OP10 <- sp_OP(out10_1010$Phi[,1:5,], itermax = 10)
```For larger data size, we recommend to reduce the output level in the call to
```sp_msfa```.