{"id":31297267,"url":"https://github.com/caranathunge/promor","last_synced_at":"2025-09-24T22:06:13.386Z","repository":{"id":43306112,"uuid":"481711277","full_name":"caranathunge/promor","owner":"caranathunge","description":"A comprehensive R package for label-free proteomics data analysis and modeling","archived":false,"fork":false,"pushed_at":"2023-07-21T14:46:38.000Z","size":168547,"stargazers_count":16,"open_issues_count":2,"forks_count":5,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-21T01:28:56.868Z","etag":null,"topics":["biomarkers","differential-expression","lfq","machine-learning","mass-spectrometry","modeling","proteomics","r","r-package","rstats"],"latest_commit_sha":null,"homepage":"https://caranathunge.github.io/promor/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-2.1","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/caranathunge.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-14T18:34:08.000Z","updated_at":"2025-06-03T21:39:39.000Z","dependencies_parsed_at":"2023-02-10T12:16:34.578Z","dependency_job_id":null,"html_url":"https://github.com/caranathunge/promor","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/caranathunge/promor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caranathunge%2Fpromor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caranathunge%2Fpromor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caranathunge%2Fpromor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caranathunge%2Fpromor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/caranathunge","download_url":"https://codeload.github.com/caranathunge/promor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caranathunge%2Fpromor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276824969,"owners_count":25711261,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-24T02:00:09.776Z","response_time":97,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biomarkers","differential-expression","lfq","machine-learning","mass-spectrometry","modeling","proteomics","r","r-package","rstats"],"created_at":"2025-09-24T22:06:12.080Z","updated_at":"2025-09-24T22:06:13.380Z","avatar_url":"https://github.com/caranathunge.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n---\noutput: \n  github_document:\n    number_sections: FALSE\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"70%\"\n)\n```\n\n\n\n# promor \u003cimg src=\"man/figures/logo.png\" align=\"right\" height=\"200\" style=\"float:right; height:200px;\"\u003e\n\n### Proteomics Data Analysis and Modeling Tools\n\n\u003c!-- badges: start --\u003e\n[![CRAN status](https://www.r-pkg.org/badges/version/promor)](https://CRAN.R-project.org/package=promor)\n[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/promor?color=blue)](https://r-pkg.org/pkg/promor)\n[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/promor?color=blue)](https://r-pkg.org/pkg/promor)\n[![R-CMD-check](https://github.com/caranathunge/promor/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/caranathunge/promor/actions/workflows/R-CMD-check.yaml)\n[![test-coverage](https://github.com/caranathunge/promor/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/caranathunge/promor/actions/workflows/test-coverage.yaml)\n[![License: LGPL v2.1](https://img.shields.io/badge/License-LGPL_v2.1-blue.svg)](https://www.gnu.org/licenses/lgpl-2.1)\n\u003c!-- badges: end --\u003e\n\n\n* `promor` is a user-friendly, comprehensive R package that combines proteomics \ndata analysis with machine learning-based modeling. \n\n* `promor` streamlines differential expression analysis of \n**label-free quantification (LFQ)** proteomics data and building predictive \nmodels with top protein candidates.\n\n* With `promor` we provide a range of quality control and visualization tools to analyze label-free proteomics data at the \nprotein level.\n\n* Input files for `promor` are a [proteinGroups.txt ](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt)\nfile produced by [**MaxQuant**](https://maxquant.org) or a [standard input file](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/st.txt) containing a quantitative matrix of protein intensities and an [expDesign.txt](https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt) file containing the experimental design of your proteomics data.\n\n* The standard input file should be a tab-delimited text file. Proteins or protein groups should be indicated by rows and samples by columns. Protein names should be listed in the first column and you may use a column name of your choice for the first column. The remaining sample column names should match the sample names indicated by the mq_label column in the expDesign.txt file.\n\n:rotating_light:**Check out our R Shiny app:** [PROMOR App](https://sgrbnf.shinyapps.io/PROMOR_App/)\n\n___\n\n### Installation\n\nInstall the released version from CRAN\n\n``` r\ninstall.packages(\"promor\")\n```\n\nInstall development version from [GitHub](https://github.com/caranathunge/promor)\n\n``` r\n# install devtools, if you haven't already:\ninstall.packages(\"devtools\")\n\n# install promor from github\ndevtools::install_github(\"caranathunge/promor\")\n```\n\n---\n\n\n### Proteomics data analysis with promor\n\n![promor prot analysis flow chart by caranathunge](./man/figures/promor_ProtAnalysisFlowChart_small.png){width=100%}\n*Figure 1. A schematic diagram of suggested workflows for proteomics data analysis with promor.*\n\n\n#### Example\n\nHere is a minimal working example showing how to identify differentially\nexpressed proteins between two conditions using `promor` in five simple steps. \nWe use a previously published data set from [Cox et al. (2014)](https://europepmc.org/article/MED/24942700#id609082) (PRIDE ID: PXD000279).\n\n```{r example, results = 'hide', warning=FALSE, eval = FALSE}\n# Load promor\nlibrary(promor)\n\n# Create a raw_df object with the files provided in this github account.\nraw \u003c- create_df(\n  prot_groups = \"https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt\",\n  exp_design = \"https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt\"\n)\n\n# Filter out proteins with high levels of missing data in either condition or group\nraw_filtered \u003c- filterbygroup_na(raw)\n\n# Impute missing data and create an imp_df object.\nimp_df \u003c- impute_na(raw_filtered)\n\n# Normalize data and create a norm_df object\nnorm_df \u003c- normalize_data(imp_df)\n\n# Perform differential expression analysis and create a fit_df object\nfit_df \u003c- find_dep(norm_df)\n```\n\nLets take a look at the results using a volcano plot.\n\n```{r volcanoplot, warning = FALSE,  eval=FALSE}\nvolcano_plot(fit_df, text_size = 5)\n```\n\n\u003ccenter\u003e\n\n![](./man/figures/README-volcanoplot-2.png){width=70%}\n\n\u003c/center\u003e\n\n\n---\n\n### Modeling with promor\n\n![promor flowchart-modeling by caranathunge](./man/figures/promor_ProtModelingFlowChart_small.png){width=100%}\n*Figure 2. A schematic diagram of suggested workflows for building predictive models with promor.*\n\n#### Example\n\nThe following minimal working example shows you how to use your results from  differential expression analysis to build machine learning-based predictive models using `promor`. \n\nWe use a previously published data set from [Suvarna et al. (2021)](https://www.frontiersin.org/articles/10.3389/fphys.2021.652799/full#h3) that used differentially expressed proteins between severe and non-severe COVID patients to build models to predict COVID severity.\n\n```{r modeling_example, results = 'hide', warning = FALSE, message = F, eval = FALSE}\n# First, let's make a model_df object of top differentially expressed proteins.\n# We will be using example fit_df and norm_df objects provided with the package.\ncovid_model_df \u003c- pre_process(\n  fit_df = covid_fit_df,\n  norm_df = covid_norm_df\n)\n\n# Next, we split the data into training and test data sets\ncovid_split_df \u003c- split_data(model_df = covid_model_df)\n\n# Let's train our models using the default list of machine learning algorithms\ncovid_model_list \u003c- train_models(split_df = covid_split_df)\n\n# We can now use our models to predict the test data\ncovid_prob_list \u003c- test_models(\n  model_list = covid_model_list,\n  split_df = covid_split_df\n)\n```\n\n\nLet's make ROC plots to check how the different models performed.\n\n```{r rocplot, warning = FALSE, eval = FALSE,  message = F}\n\nroc_plot(\n  probability_list = covid_prob_list,\n  split_df = covid_split_df\n)\n```\n\n\n\u003ccenter\u003e\n\n![](./man/figures/README-rocplot-1.png){width=70%} \n\u003c/center\u003e\n\n---\n\n### Tutorials\n\nYou can choose a tutorial from the list below that best fits your experiment \nand the structure of your proteomics data.\n\n1. This README file can be accessed from RStudio as follows,\n\n``` r\nvignette(\"intro_to_promor\", package = \"promor\")\n```\n\n2. If your data do NOT contain technical replicates:\n[promor: No technical replicates](https://caranathunge.github.io/promor/articles/promor_no_techreps.html)\n\n3. If your data contain technical replicates:\n[promor: Technical replicates](https://caranathunge.github.io/promor/articles/promor_with_techreps.html)\n\n4. If you would like to use your proteomics data to build predictive models:\n[promor: Modeling](https://caranathunge.github.io/promor/articles/promor_for_modeling.html)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaranathunge%2Fpromor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaranathunge%2Fpromor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaranathunge%2Fpromor/lists"}