https://github.com/jasenfinch/metabolyser

Methods for pre-treatment, modelling/data mining, and correlation analyses of metabolomics data
https://github.com/jasenfinch/metabolyser

correlation-analyses data-mining metabolomics-data modelling pre-treatment r-package

Last synced: 5 months ago
JSON representation

Methods for pre-treatment, modelling/data mining, and correlation analyses of metabolomics data

Host: GitHub
URL: https://github.com/jasenfinch/metabolyser
Owner: jasenfinch
Created: 2017-04-21T12:48:23.000Z (about 8 years ago)
Default Branch: master
Last Pushed: 2024-04-10T13:08:17.000Z (over 1 year ago)
Last Synced: 2024-12-18T11:15:35.858Z (7 months ago)
Topics: correlation-analyses, data-mining, metabolomics-data, modelling, pre-treatment, r-package
Language: R
Homepage: https://jasenfinch.github.io/metabolyseR/
Size: 58.8 MB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(collapse = TRUE, 

                      comment = "#>",

                      fig.align = 'center',

                      fig.path = "man/figures/README-",

                      message = FALSE,

                      warning = FALSE)

```

# metabolyseR

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)

[![R-CMD-check](https://github.com/jasenfinch/metabolyseR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jasenfinch/metabolyseR/actions/workflows/R-CMD-check.yaml)

[![codecov](https://codecov.io/gh/jasenfinch/metabolyseR/branch/master/graph/badge.svg)](https://codecov.io/gh/jasenfinch/metabolyseR/branch/master) 

[![license](https://img.shields.io/badge/license-GNU%20GPL%20v3.0-blue.svg)](https://github.com/jasenfinch/metabolyseR/blob/master/DESCRIPTION)

[![DOI](https://zenodo.org/badge/88983134.svg)](https://zenodo.org/badge/latestdoi/88983134)

[![GitHub release](https://img.shields.io/github/release/jasenfinch/metabolyseR.svg)](https://GitHub.com/jasenfinch/metabolyseR/releases/)

> **A tool kit for pre-treatment, modelling, feature selection and correlation analyses of metabolomics data.**

## Overview

This package provides a tool kit of methods for metabolomics analyses that includes: 

* data pre-treatment

* multivariate and univariate modelling/data mining techniques

* correlation analysis

## Installation

The `metabolyseR` package can be installed from GitHub using the following:

```r

remotes::install_github('jasenfinch/metabolyseR')

```

The package documentation can be browsed online at ; however, if users want to compile the vignettes locally, the following can be used.

```r

remotes::install_github('jasenfinch/metabolyseR',build_vignettes = TRUE,dependencies = TRUE)

```

## Learn more

The package documentation can be browsed online at . 

If this is your first time using `metabolyseR` see the [Introduction](https://jasenfinch.github.io/metabolyseR/articles/metabolyseR.html) vignette or the quick start analysis below for information on how to get started.

If you believe you've found a bug in `metabolyseR`, please file a bug (and, if

possible, a [reproducible example](https://reprex.tidyverse.org)) at

.

## Quick start example analysis

This example analysis will use the `abr1` data set from the [metaboData](https://aberhrml.github.io/metaboData/) package. 

It is nominal mass flow-injection mass spectrometry (FI-MS) fingerprinting data from a plant-pathogen infection time course experiment.

The analysis will also include use of the pipe `%>%` from the [magrittr](https://magrittr.tidyverse.org/) package.

First load the necessary packages.

```{r setup}

library(metabolyseR)

library(metaboData)

```

For this example we will use only the negative acquisition mode data (`abr1$neg`) and sample meta-information (`abr1$fact`).

Create an `AnalysisData` class object using the following:

```{r analysis_data}

d <- analysisData(abr1$neg,abr1$fact)

```

The data includes `r nSamples(d)` samples and `r nFeatures(d)` mass spectral features as shown below.

```{r print_analysis_data}

d

```

The `clsAvailable()` function can be used to identify the columns available in our meta-information table. 

```{r}

clsAvailable(d)

```

For this analysis, we will be using the infection time course class information contained in the `day` column.

This can be extracted and the class frequencies tabulated using the following:

```{r}

d %>%

  clsExtract(cls = 'day') %>%

  table()

```

As can be seen above, the experiment is made up of six infection time point classes that includes a healthy control class (`H`) and five day infection time points (`1-5`), each with 20 replicates. 

For data pre-treatment prior to statistical analysis, a two-thirds maximum class occupancy filter can be applied.

Features where the maximum proportion of non-missing data per class is above two-thirds are retained.

A total ion count normalisation will also be applied.

```{r pre_treat}

d <- d %>%

  occupancyMaximum(cls = 'day', occupancy = 2/3) %>%

  transformTICnorm()

```

```{r pre_treat_result}

d

```

This has reduced the data set to `r nFeatures(d)` relevant features.

The structure of the data can be visualised using both unsupervised and supervised methods. For instance, the first two principle components from a principle component analysis (PCA) of the data with the sample points coloured by infection class can be plotted using: 

```{r pca}

plotPCA(d,cls = 'day',xAxis = 'PC1',yAxis = 'PC2')

```

And similarly, multidimensional scaling (MDS) of sample proximity values from a supervised random forest classification model along with receiver operator characteristic (ROC) curves.

```{r supervised_RF}

plotSupervisedRF(d,cls = 'day')

```

A progression can clearly be seen from the earliest to latest infected time points.

For feature selection, one-way analysis of variance (ANOVA) can be performed for each feature to identify features significantly explanatory for the infection time point.

```{r anova}

anova_results <- d %>%

  anova(cls = 'day')

```

A table of the significantly explanatory features can be extracted with a bonferroni correction adjusted p value < 0.05 using:

```{r explanatoty_features_extract}

explan_feat <- explanatoryFeatures(anova_results,threshold = 0.05)

```

```{r,explanatory_features}

explan_feat

```

The ANOVA has identified `r nrow(explan_feat)` features significantly explanatory over the infection time course.

A heat map of the mean relative intensity for each class of these explanatory features can be plotted to visualise their trends between the infection time point classes.

```{r rf_heatmap,fig.height=10,fig.width=5}

plotExplanatoryHeatmap(anova_results,

                       threshold = 0.05,

                       featureNames = FALSE)

```

Many of the explanatory features can be seen to be most highly abundant in the final infection time point `5`.

Finally, box plots of the trends of individual features can be plotted, such as the `N341` feature below.

```{r feature_plot}

plotFeature(anova_results,feature = 'N341',cls = 'day')

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jasenfinch/metabolyser

Awesome Lists containing this project

README