https://github.com/guokai8/o2plsda

Omics data integration with o2plsda
https://github.com/guokai8/o2plsda

integration multi-omics o2pls omics plsda

Last synced: 3 months ago
JSON representation

Omics data integration with o2plsda

Host: GitHub
URL: https://github.com/guokai8/o2plsda
Owner: guokai8
License: gpl-3.0
Created: 2021-10-04T15:23:56.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-12-02T01:35:08.000Z (7 months ago)
Last Synced: 2026-01-29T21:55:21.693Z (5 months ago)
Topics: integration, multi-omics, o2pls, omics, plsda
Language: R
Homepage:
Size: 294 KB
Stars: 8
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # o2plsda: Multiomics Data Integration

# o2plsda [![Project Status:](http://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)  [![](https://img.shields.io/badge/devel%20version-0.0.28-green.svg)](https://github.com/guokai8/o2plsda)  ![Code Size:](https://img.shields.io/github/languages/code-size/guokai8/o2plsda)![](https://img.shields.io/badge/license-GPL--3-blue.svg)[![DOI](https://zenodo.org/badge/413478714.svg)](https://zenodo.org/badge/latestdoi/413478714)


_o2plsda_ provides functions to do O2PLS-DA analysis for multiple omics integration.The algorithm came from "O2-PLS, a two-block (X±Y) latent variable regression (LVR) method with an integral OSC filter" which published by Johan Trygg and Svante Wold at 2003. O2PLS is a bidirectional multivariate regression method that aims to separate the covariance between two data sets (it was recently extended to multiple data sets) (Löfstedt and Trygg, 2011; Löfstedt et al., 2012) from the systematic sources of variance being specific for each data set separately. 

### Cross-Validation

In order to avoid overfitting of the model, the optimal number of latent variables for each model structure was estimated using group-balanced MCCV. The package could use the group information when we select the best paramaters with cross-validation. In cross-validation (CV) one minimizes a certain measure of error over some parameters that should be determined a priori. Here, we have three parameters: (nc, nx, ny). A popular measure is the prediction error ||Y - \hat{Y}||, where \hat{Y} is a prediction of Y. In our case the O2PLS method is symmetric in X and Y, so we minimize the sum of the prediction errors: 

||X - \hat{X}||+||Y - \hat{Y}||. 

Here nc should be a positive integer, and nx and ny should be non-negative. The best integers are then the minimizers of the prediction error.

The O2PLS-DA analysis was performed as described by Bylesjö et al. (2007); briefly, the O2PLS predictive variation [$TW^\top$, $UC^\top$] was used for a subsequent O2PLS-DA analysis. The Variable Importance in the Projection (VIP) value was calculated as a weighted sum of the squared correlations between the OPLS-DA components and the original variable.

## Installation

```{r,eval=FALSE}

library(devtools)

install_github("guokai8/o2plsda")

``` 

## Examples

```{r}

library(o2plsda)

set.seed(123)

# sample * values

X = matrix(rnorm(5000),50,100)

# sample * values

Y = matrix(rnorm(5000),50,100)

rownames(X) <- paste("S",1:50,sep="")

rownames(Y) <- paste("S",1:50,sep="")

colnames(X) <- paste("Gene",1:100,sep="")

colnames(Y) <- paste("Lipid",1:100,sep="")

X = scale(X, scale=T)

Y = scale(Y, scale=T)

## group factor could be omitted if you don't have any group 

group <- rep(c("Ctrl","Treat"),each = 25)

```

Do cross validation with group information

```{r}

set.seed(123)

## nr_folds : cross validation k-fold (suggest 10)

## ncores : parallel paramaters for large datasets

cv <- o2cv(X,Y,1:5,1:3,1:3,group=group,nr_folds = 10)

#####################################

The best parameters are nc = 1, nx = 2, ny = 3

#####################################

The the RMSE is: 1.98186790425324

#####################################

```

Then we can do the O2PLS analysis with nc = 1, nx = 2, ny =3. You can also select the best paramaters by looking at the cross validation results.

```{r}

fit <- o2pls(X,Y,1,2,3)

summary(fit)

######### Summary of the O2PLS results #########

### Call o2pls(X, Y, nc= 1 , nx= 2 , ny= 3 ) ###

### Total variation 

### X: 4900 ; Y: 4900  ###

### Total modeled variation ### X: 0.12 ; Y: 0.16  ###

### Joint, Orthogonal, Noise (proportions) ###

               X     Y

Joint      0.041 0.047

Orthogonal 0.079 0.113

Noise      0.880 0.840

### Variation in X joint part predicted by Y Joint part: 0.934 

### Variation in Y joint part predicted by X Joint part: 0.934 

### Variation in each Latent Variable (LV) in Joint part: 

    LV1

X 0.041

Y 0.047

### Variation in each Latent Variable (LV) in X Orthogonal part: 

    LV1   LV2

X 0.044 0.035

### Variation in each Latent Variable (LV) in Y Orthogonal part: 

    LV1   LV2   LV3

Y 0.046 0.035 0.033

############################################

```

Extract the loadings and scores from the fit results

```{r}

Xl <- loadings(fit,loading="Xjoint")

Xs <- scores(fit,score="Xjoint")

plot(fit,type="score",var="Xjoint", group=group)

plot(fit,type="loading",var="Xjoint", group=group,repel=F,rotation=TRUE)

```

Do the OPLSDA based on the O2PLS results

```{r}

res <- oplsda(fit,group, nc=1)

plot(res,type="score", group=group)

vip <- vip(res)

plot(res,type="vip", group = group, repel = FALSE,order=TRUE)

```

## Note

The package is still under development.

## Citation

If you like this package, please contact me for the citation.

## Contact information

For any questions please contact guokai8@gmail.com or https://github.com/guokai8/o2plsda/issues

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/guokai8/o2plsda

Awesome Lists containing this project

README