https://github.com/caranathunge/promor

A comprehensive R package for label-free proteomics data analysis and modeling
https://github.com/caranathunge/promor

biomarkers differential-expression lfq machine-learning mass-spectrometry modeling proteomics r r-package rstats

Last synced: 22 days ago
JSON representation

A comprehensive R package for label-free proteomics data analysis and modeling

Host: GitHub
URL: https://github.com/caranathunge/promor
Owner: caranathunge
License: lgpl-2.1
Created: 2022-04-14T18:34:08.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-07-21T14:46:38.000Z (about 2 years ago)
Last Synced: 2025-09-21T01:28:56.868Z (26 days ago)
Topics: biomarkers, differential-expression, lfq, machine-learning, mass-spectrometry, modeling, proteomics, r, r-package, rstats
Language: R
Homepage: https://caranathunge.github.io/promor/
Size: 161 MB
Stars: 16
Watchers: 2
Forks: 5
Open Issues: 2
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md

Awesome Lists containing this project

README

          
---

output: 

  github_document:

    number_sections: FALSE

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "70%"

)

```

# promor 

### Proteomics Data Analysis and Modeling Tools

[![CRAN status](https://www.r-pkg.org/badges/version/promor)](https://CRAN.R-project.org/package=promor)

[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/promor?color=blue)](https://r-pkg.org/pkg/promor)

[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/promor?color=blue)](https://r-pkg.org/pkg/promor)

[![R-CMD-check](https://github.com/caranathunge/promor/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/caranathunge/promor/actions/workflows/R-CMD-check.yaml)

[![test-coverage](https://github.com/caranathunge/promor/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/caranathunge/promor/actions/workflows/test-coverage.yaml)

[![License: LGPL v2.1](https://img.shields.io/badge/License-LGPL_v2.1-blue.svg)](https://www.gnu.org/licenses/lgpl-2.1)

* `promor` is a user-friendly, comprehensive R package that combines proteomics 

data analysis with machine learning-based modeling. 

* `promor` streamlines differential expression analysis of 

**label-free quantification (LFQ)** proteomics data and building predictive 

models with top protein candidates.

* With `promor` we provide a range of quality control and visualization tools to analyze label-free proteomics data at the 

protein level.

* Input files for `promor` are a [proteinGroups.txt ](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt)

file produced by [**MaxQuant**](https://maxquant.org) or a [standard input file](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/st.txt) containing a quantitative matrix of protein intensities and an [expDesign.txt](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt) file containing the experimental design of your proteomics data.

* The standard input file should be a tab-delimited text file. Proteins or protein groups should be indicated by rows and samples by columns. Protein names should be listed in the first column and you may use a column name of your choice for the first column. The remaining sample column names should match the sample names indicated by the mq_label column in the expDesign.txt file.

:rotating_light:**Check out our R Shiny app:** [PROMOR App](https://sgrbnf.shinyapps.io/PROMOR_App/)

___

### Installation

Install the released version from CRAN

``` r

install.packages("promor")

```

Install development version from [GitHub](https://github.com/caranathunge/promor)

``` r

# install devtools, if you haven't already:

install.packages("devtools")

# install promor from github

devtools::install_github("caranathunge/promor")

```

---

### Proteomics data analysis with promor

![promor prot analysis flow chart by caranathunge](./man/figures/promor_ProtAnalysisFlowChart_small.png){width=100%}

*Figure 1. A schematic diagram of suggested workflows for proteomics data analysis with promor.*

#### Example

Here is a minimal working example showing how to identify differentially

expressed proteins between two conditions using `promor` in five simple steps. 

We use a previously published data set from [Cox et al. (2014)](https://europepmc.org/article/MED/24942700#id609082) (PRIDE ID: PXD000279).

```{r example, results = 'hide', warning=FALSE, eval = FALSE}

# Load promor

library(promor)

# Create a raw_df object with the files provided in this github account.

raw <- create_df(

  prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",

  exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"

)

# Filter out proteins with high levels of missing data in either condition or group

raw_filtered <- filterbygroup_na(raw)

# Impute missing data and create an imp_df object.

imp_df <- impute_na(raw_filtered)

# Normalize data and create a norm_df object

norm_df <- normalize_data(imp_df)

# Perform differential expression analysis and create a fit_df object

fit_df <- find_dep(norm_df)

```

Lets take a look at the results using a volcano plot.

```{r volcanoplot, warning = FALSE,  eval=FALSE}

volcano_plot(fit_df, text_size = 5)

```

![](./man/figures/README-volcanoplot-2.png){width=70%}

---

### Modeling with promor

![promor flowchart-modeling by caranathunge](./man/figures/promor_ProtModelingFlowChart_small.png){width=100%}

*Figure 2. A schematic diagram of suggested workflows for building predictive models with promor.*

#### Example

The following minimal working example shows you how to use your results from  differential expression analysis to build machine learning-based predictive models using `promor`. 

We use a previously published data set from [Suvarna et al. (2021)](https://www.frontiersin.org/articles/10.3389/fphys.2021.652799/full#h3) that used differentially expressed proteins between severe and non-severe COVID patients to build models to predict COVID severity.

```{r modeling_example, results = 'hide', warning = FALSE, message = F, eval = FALSE}

# First, let's make a model_df object of top differentially expressed proteins.

# We will be using example fit_df and norm_df objects provided with the package.

covid_model_df <- pre_process(

  fit_df = covid_fit_df,

  norm_df = covid_norm_df

)

# Next, we split the data into training and test data sets

covid_split_df <- split_data(model_df = covid_model_df)

# Let's train our models using the default list of machine learning algorithms

covid_model_list <- train_models(split_df = covid_split_df)

# We can now use our models to predict the test data

covid_prob_list <- test_models(

  model_list = covid_model_list,

  split_df = covid_split_df

)

```

Let's make ROC plots to check how the different models performed.

```{r rocplot, warning = FALSE, eval = FALSE,  message = F}

roc_plot(

  probability_list = covid_prob_list,

  split_df = covid_split_df

)

```

![](./man/figures/README-rocplot-1.png){width=70%} 

---

### Tutorials

You can choose a tutorial from the list below that best fits your experiment 

and the structure of your proteomics data.

1. This README file can be accessed from RStudio as follows,

``` r

vignette("intro_to_promor", package = "promor")

```

2. If your data do NOT contain technical replicates:

[promor: No technical replicates](https://caranathunge.github.io/promor/articles/promor_no_techreps.html)

3. If your data contain technical replicates:

[promor: Technical replicates](https://caranathunge.github.io/promor/articles/promor_with_techreps.html)

4. If you would like to use your proteomics data to build predictive models:

[promor: Modeling](https://caranathunge.github.io/promor/articles/promor_for_modeling.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/caranathunge/promor

Awesome Lists containing this project

README