{"id":20982023,"url":"https://github.com/dev-ev/isobaric-prot-wf-r","last_synced_at":"2025-08-13T19:20:49.668Z","repository":{"id":202026325,"uuid":"413045069","full_name":"dev-ev/isobaric-prot-wf-r","owner":"dev-ev","description":"Simple workflows for the isobaric-labeling proteomic data from Proteome Discoverer with ANOVA, t-testing, DEqMS/limma and annotation via fgsea","archived":false,"fork":false,"pushed_at":"2022-05-19T20:04:23.000Z","size":982,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T20:38:26.913Z","etag":null,"topics":["data-visualization","deqms","limma","proteomics","proteomics-data","statistical-analysis","tmt-data-analysis"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dev-ev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-10-03T10:28:00.000Z","updated_at":"2025-04-01T09:23:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"dcdbb412-372a-4cfe-b473-596f9cb0f866","html_url":"https://github.com/dev-ev/isobaric-prot-wf-r","commit_stats":null,"previous_names":["dev-ev/isobaric-prot-wf-r"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev-ev%2Fisobaric-prot-wf-r","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev-ev%2Fisobaric-prot-wf-r/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev-ev%2Fisobaric-prot-wf-r/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dev-ev%2Fisobaric-prot-wf-r/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dev-ev","download_url":"https://codeload.github.com/dev-ev/isobaric-prot-wf-r/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254182826,"owners_count":22028367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-visualization","deqms","limma","proteomics","proteomics-data","statistical-analysis","tmt-data-analysis"],"created_at":"2024-11-19T05:42:21.734Z","updated_at":"2025-05-14T16:31:08.050Z","avatar_url":"https://github.com/dev-ev.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# isobaric-prot-wf-r\n## Simple workflow for isobaric-labeling proteomic data with ANOVA, t-testing, DEqMS/limma and annotation via fgsea\n\nThe author recommends [RStudio IDE](https://www.rstudio.com/). The script was developed and tested on Ubuntu 20.04 with R 3.6.\u003cbr\u003e\nUsage:\n* Download and run the \".R\" script, adapting the paths and formats\n* Pay particular attention to the format of the output file and to sample names. The example table has the quantitative data in \"Abundance Ratio\" columns, the biological replicates from the same group are named *xyz_1*, *xyz_2*, *xyz_3*.\n* \nSee the main highlights of the workflows below:\n```r\n#library(readxl)\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(tidyr)\nlibrary(DEqMS)\nlibrary(fgsea)\n```\nSelect the work directory and import the protein table.\n\u003cbr\u003eThe example data originates from the analysis of the commercial yeast [triple-knockout (TKO) TMT standard](https://www.thermofisher.com/order/catalog/product/A40938#/A40938)\n\u003cbr\u003eLC-MS rw files were processed via [Proteome Discoverer 2.4](https://www.thermofisher.com/order/catalog/product/OPTON-30812?ce=E.21CMD.DL107.39848.01\u0026cid=E.21CMD.DL107.39848.01\u0026ef_id=EAIaIQobChMIia_rv4-u8wIVAq-yCh2J3wI6EAAYASAAEgLKhfD_BwE:G:s\u0026s_kwcid=AL!3652!3!334040549172!e!!g!!proteome%20discoverer#/OPTON-30812?ce=E.21CMD.DL107.39848.01\u0026cid=E.21CMD.DL107.39848.01\u0026ef_id=EAIaIQobChMIia_rv4-u8wIVAq-yCh2J3wI6EAAYASAAEgLKhfD_BwE:G:s\u0026s_kwcid=AL!3652!3!334040549172!e!!g!!proteome%20discoverer), the output files were saved as tab-separated text file with \"R-friendly header\" set to \"True\".\n\n```r\n#Set the working directory for the project\nsetwd(\"/home/trainee/Documents/R/Yeast_TKO/\")\n#If the table has been exported as a tab-separated text file\nall_proteins \u003c- read.table(\"MS3_FAIMS_Proteins_PD24.txt\", sep = '\\t', header = TRUE)\n#If the table is in Excel format\n#all_proteins \u003c- read_excel(\"MS3_FAIMS_Proteins_PD24.xlsx\")\n```\n\n```r\ndim(all_proteins)\n```\n\n```\n## [1] 1904   51\n```\n\nThe quantitative columns (abundance ratio columns) are extracted and renamed:\n```r\nres \u003c- rename_ratios(all_proteins)\nall_proteins \u003c- res[[1]]\nquan_columns \u003c- res[[2]]\nquan_columns\n```\n\n```\n##  [1] \"BY4741_2\" \"his4_1\"   \"his4_2\"   \"his4_3\"   \"met6_1\"   \"met6_2\"   \"met6_3\"   \"ura2_1\"   \"ura2_2\"   \"ura2_3\"\n```\n\nAfter several stages of filtering, we check the abundance distibution in each sample:\n\n```r\ndfWide \u003c- all_proteins %\u003e%\n  filter(!grepl(\"cont_\",Accession)) %\u003e%\n  subset (select=c(\"Accession\", quan_columns) ) %\u003e%\n  na.omit()\n\nrownames(dfWide) \u003c- dfWide$Accession\ndfWide$Accession \u003c- NULL\ndfWide \u003c- log2(dfWide)\n#Look at the distribution of quan values\nsummary(dfWide)\n```\n\n```\n##     BY4741_2            his4_1              his4_2              his4_3              met6_1        \n##  Min.   :-0.76121   Min.   :-2.426625   Min.   :-2.343732   Min.   :-2.434403   Min.   :-3.36587  \n##  1st Qu.:-0.08161   1st Qu.:-0.189351   1st Qu.:-0.193058   1st Qu.:-0.195946   1st Qu.:-0.23447  \n##  Median :-0.01013   Median :-0.020340   Median :-0.014500   Median :-0.017417   Median :-0.08238  \n##  Mean   :-0.01831   Mean   : 0.005853   Mean   : 0.003676   Mean   : 0.000839   Mean   :-0.06536  \n##  3rd Qu.: 0.04963   3rd Qu.: 0.180148   3rd Qu.: 0.186501   3rd Qu.: 0.180148   3rd Qu.: 0.09390  \n##  Max.   : 0.76043   Max.   : 3.528321   Max.   : 3.441218   Max.   : 3.478195   Max.   : 1.84679  \n##      met6_2             met6_3             ura2_1             ura2_2             ura2_3        \n##  Min.   :-3.15843   Min.   :-3.26534   Min.   :-2.49005   Min.   :-2.57347   Min.   :-2.44222  \n##  1st Qu.:-0.20257   1st Qu.:-0.22600   1st Qu.:-0.18033   1st Qu.:-0.18115   1st Qu.:-0.16974  \n##  Median :-0.06492   Median :-0.07704   Median :-0.03579   Median :-0.04097   Median :-0.03357  \n##  Mean   :-0.04076   Mean   :-0.05791   Mean   :-0.02990   Mean   :-0.03511   Mean   :-0.02894  \n##  3rd Qu.: 0.10467   3rd Qu.: 0.09119   3rd Qu.: 0.12433   3rd Qu.: 0.11636   3rd Qu.: 0.10836  \n##  Max.   : 1.68840   Max.   : 1.79119   Max.   : 2.07108   Max.   : 1.61259   Max.   : 1.92828\n```\n\n```r\n#Box Plot\nboxplot(\n  Log2_Abund~Sample, data = gather(dfWide, Sample, Log2_Abund),\n  main = \"Original Log2 Ratios\"\n  )\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-1.png)\n\nProteomic data may require normalization. Let's normalize each sample on it's median and check out the resulting distributions:\n\n```r\n#For each column, subtract the median of the column from each of it's values\ndfNorm \u003c- mapply('-', dfWide, apply(dfWide,2,median))\n#Transform into a dataframe\ndfNorm \u003c- as.data.frame(dfNorm, row.names = row.names(dfWide))\nboxplot(\n  Log2_Abund~Sample, data = gather(dfNorm, Sample, Log2_Abund),\n  main = \"Normalized Log2 Ratios\"\n  )\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-2.png)\n\n```r\n#If desired, use the normalized data for the downstream operations\ndfWide \u003c- dfNorm\ndfWide$Accession \u003c- row.names(dfWide)\ndfLong \u003c- gather(dfWide, Sample, Log2_Abund, -Accession)\ndfWide$Accession \u003c- NULL\n```\n\n\nLook at the principle component analysis on samples:\n```r\ndfWide.t \u003c-  t(dfWide)\ndfWide.pca \u003c- prcomp(dfWide.t, center = TRUE, scale. = FALSE)\nsummary(dfWide.pca)\n```\n\n```\n## Importance of components:\n##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7     PC8     PC9      PC10\n## Standard deviation     4.7762 3.8970 2.8019 1.78159 1.61259 1.53323 1.51155 1.40213 1.36160 2.443e-15\n## Proportion of Variance 0.3797 0.2528 0.1307 0.05283 0.04328 0.03913 0.03803 0.03272 0.03086 0.000e+00\n## Cumulative Proportion  0.3797 0.6325 0.7631 0.81598 0.85926 0.89839 0.93642 0.96914 1.00000 1.000e+00\n```\n\nLet's plot the first 4 principal components, as they explain the bulk of the variation within the data set:\n\n```r\ndfWide.pca \u003c- as.data.frame(dfWide.pca$x)\ndfWide.pca$Group \u003c- sapply(\n  as.character( row.names(dfWide.pca) ),\n  function(x) {\n    strsplit(x, \"_\")[[1]][[1]]\n  }\n)\n#Principal components 1 and 2\nggplot(\n  dfWide.pca,\n  aes(x = PC1, y = PC2, colour = Group )\n  ) +\n  geom_point(shape=19, size=4, alpha = 0.7)+\n  geom_hline(yintercept = 0, colour = \"gray65\") +\n  geom_vline(xintercept = 0, colour = \"gray65\") +\n  ggtitle(\"PCA On Proteins\") +\n  theme_classic()\n```\n\n![TKO_PCA_components_1_2](figure/PCA_PCs12.png)\n```r\n#Principal components 3 and 4\nggplot(\n  dfWide.pca,\n  aes(x = PC3, y = PC4, colour = Group )\n) +\n  geom_point(shape=19, size=4, alpha = 0.7)+\n  geom_hline(yintercept = 0, colour = \"gray65\") +\n  geom_vline(xintercept = 0, colour = \"gray65\") +\n  ggtitle(\"PCA On Proteins\") +\n  theme_classic()\n```\n![TKO_PCA_components_3_4](figure/PCA_PCs34.png)\n\nOne-way ANOVA analysis:\n```r\ncols_anova \u003c- c(\"his4_1\", \"his4_2\", \"his4_3\", \"met6_1\", \"met6_2\", \"met6_3\", \"ura2_1\", \"ura2_2\", \"ura2_3\")\n#Create the named list with our groups\ngroups \u003c- list()\nfor (i in cols_anova) {\n  local_name \u003c- strsplit(i, \"_\")[[1]][[1]]\n  if ( local_name %in% names(groups) ) {\n    groups[[local_name]] \u003c- c(groups[[local_name]], i)\n  } else {\n    groups[[local_name]] \u003c- c(i)\n  }\n}\ngroups\n```\n\n```\n## $his4\n## [1] \"his4_1\" \"his4_2\" \"his4_3\"\n## \n## $met6\n## [1] \"met6_1\" \"met6_2\" \"met6_3\"\n## \n## $ura2\n## [1] \"ura2_1\" \"ura2_2\" \"ura2_3\"\n```\n\n```r\ndfANOVA \u003c- dfWide\ndfANOVA$anovaPval \u003c- apply(dfANOVA, 1, function(x) {\n  df \u003c- as.data.frame(x)\n  #Select the samples for ANOVA\n  #Important if you need to exclude some of the samples from the calculation\n  #cols_anova \u003c- c(\"his4_1\", \"his4_2\", \"his4_3\", \"met6_1\", \"met6_2\", \"met6_3\", \"ura2_1\", \"ura2_2\", \"ura2_3\")\n  df$Sample \u003c- rownames(df)\n  df \u003c- df[ df$Sample %in% cols_anova, ]\n  \n  #Define groups in sync with the selected columns\n  #OBS: format-dependent\n  df$Group \u003c- as.vector(\n    sapply(\n      cols_anova,\n      function(x) { strsplit(x, \"_\")[[1]][[1]] }\n      )\n    )\n  anovaResults \u003c- aov(x ~ Group, data = df)\n  #This Very exciting expression is how to extract the p-value from the aov summary\n  return(summary(anovaResults)[[1]][\"Pr(\u003eF)\"][[1]][[1]])\n})\n#Benjamini-Hochberg correction for multiple testing\ndfANOVA$adjPval \u003c- p.adjust(dfANOVA$anovaPval, method = \"BH\")\n#Add group averages\nfor ( i in names(groups) ) {\n  dfANOVA[i] \u003c- apply(\n    dfANOVA, 1, function(x) {\n      #print(x)\n      #print(typeof(x))\n      mean( x[ groups[[i]] ] )\n    }\n  )\n}\n#Add max LogFC\ndfANOVA$MaxLog2FC \u003c- apply(\n  dfANOVA, 1, function(x) {\n    log_means \u003c- x[ names(groups) ]\n    max(log_means) - min(log_means)\n  }\n)\n```\n\n\nSelect proteins with adjusted ANOVA P \u003c= 0.05 and FC \u003e log2(1.3) (30%)\n\n```r\ndfANOVA.Sign \u003c- dfANOVA %\u003e%\n  filter(adjPval \u003c= 0.05 \u0026 MaxLog2FC \u003e= log2(1.3) ) %\u003e%\n  select(cols_anova)\ndim(dfANOVA.Sign)\n```\n\n```\n## [1] 190  9\n```\n\n```r\nheatmap(\n  as.matrix( dfANOVA.Sign ), scale = \"none\",\n  col = colorRampPalette(c(\"red\", \"grey\", \"blue\"))(256)\n)\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-8.png)\n\n\nThe classic way to compare levels in two groups is the Student's t-test.\u003cbr\u003e\nLet's compare the *met6* knockouts and *his4* knockouts as an example:\n```r\ncalc_ttest \u003c- function(df, groupping, gr1, gr2, maxAdjP, minFC) {\n  df \u003c- df[ c( groupping[[gr1]], groupping[[gr2]]  ) ]\n  #Log2 fold change group2 - group1\n  df$Log2FC \u003c- apply(\n    df, 1, function(x) {\n      mean( x[ groupping[[gr2]] ] ) - mean( x[ groupping[[gr1]] ] )\n    }\n  )\n  #T-test with equal variance\n  df$T_Pval \u003c- apply(\n    df, 1, function(x) {\n      res \u003c- t.test(\n        x[ groupping[[gr2]] ], x[ groupping[[gr1]] ],\n        alternative = \"two.sided\", var.equal = TRUE\n        )\n      mean( x[ groupping[[gr2]] ] ) - mean( x[ groupping[[gr1]] ] )\n      res$p.value\n    }\n  )\n  #Benjamini-Hochberg correction for multiple testing\n  df$adjPval \u003c- p.adjust(df$T_Pval, method = \"BH\")\n  df$Log10adjPval \u003c- -1*log10(df$adjPval)\n  #Add the categorical column\n  df$Diff_Abund \u003c- apply(\n    df, 1, function(x) {\n      if (x[[\"adjPval\"]] \u003c= maxAdjP \u0026 x[[\"Log2FC\"]] \u003e= minFC) {\n        return( paste(\"Up in\", gr2) )\n      } else if (x[[\"adjPval\"]] \u003c= maxAdjP \u0026 x[[\"Log2FC\"]] \u003c= -1*minFC) {\n        return( paste(\"Up in\", gr1) )\n      } else {\n        return('Non-significant')\n      }\n    }\n  )\n  df\n}\nmaxAdjP \u003c- 0.05\nminLog2FC \u003c- round(log2(1.3), 3)\ngr1 \u003c- \"met6\"\ngr2 \u003c- \"his4\"\ndfTtest \u003c- calc_ttest(dfWide, groups, gr1, gr2, maxAdjP, minLog2FC )\n#How many proteins have adj. P \u003c= 0.05?\ndim( filter(dfTtest, adjPval \u003c= maxAdjP) )\n```\n```\n## [1] 493  11\n```\nLet's ad the minimal fold change to concentrate on the strongest significant effects.\u003cbr\u003e\nThere were 116 proteins that pass the filtering criteria:\n```r\ndim(dfTtest %\u003e%\n      filter(adjPval \u003c= maxAdjP) %\u003e%\n      filter( Log2FC \u003e= minLog2FC | Log2FC \u003c= -1*minLog2FC ))\n```\n\n```\n## [1] 116  11\n```\n\nVolcano plot for the met6-his4 comparison:\n```r\n#Add columns with the gene names and other info\ndfTtest \u003c- merge(\n  dfTtest, all_proteins,\n  by.x=\"row.names\", by.y=\"Accession\",\n  suffixes=c(\"\", \"_\"), sort=FALSE\n  )\n\n#Volcano plot\nggplot(\n  dfTtest,\n  aes(x = Log2FC, y = Log10adjPval, colour = Diff_Abund )\n) +\n  geom_point(shape=19, size=2, alpha = 0.6)+\n  geom_hline(yintercept = -1*log10(maxAdjP), colour = \"gray65\") +\n  geom_vline(xintercept = 0, colour = \"gray65\") +\n  geom_vline(xintercept = -1*minLog2FC, colour = \"gray65\") +\n  geom_vline(xintercept = minLog2FC, colour = \"gray65\") +\n  ggtitle(\n    paste(\n      \"T-test \", gr1, \" vs \", gr2,\n      \" Adjusted P-value\u003c=\", maxAdjP, \" Log2 FC\u003e=\", minLog2FC,\n      sep=\"\"\n      )\n    ) +\n  theme_classic() +\n  theme(\n    legend.title = element_blank(), legend.text = element_text(size=12),\n    plot.title = element_text(size=16)\n    ) +\n  labs(x = paste(\"Log2 FC\", gr2, \"-\", gr1), y = \"-Log10 Adj. P-value\" ) +\n  geom_text(\n    data = subset(dfTtest, Log2FC \u003e=0.9 | Log2FC \u003c= -0.8),\n    aes( Log2FC, Log10adjPval, label = Gene),\n    alpha = 0.6, hjust = 0.5, vjust = -0.6\n    )\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-9.png)\n\n\n[DEqMS](https://www.bioconductor.org/packages/release/bioc/html/DEqMS.html) is an R package for differential expression analysis that have been developped specifically for proteomic data by Yafeng Zhu in Janne Lehtiö's lab in Stockholm. The algorithm estimates the protein variance based on the number of peptides or PSMs, claiming better accuracy in detecting differences in protein levels, see [the open-access MCP article](https://www.mcponline.org/article/S1535-9476(20)34997-5/fulltext) for details. DEqMS builds on the widely used [limma package](https://bioconductor.org/packages/release/bioc/html/limma.html), p-value and BH-adjusted p-value from limma are reported as a bonus, alongside the spectra count adjusted posterior p-values and BH-adjusted p-values calculated by DEqMS.\u003cbr\u003e\nLet's apply the algorithm to the TKO data:\n\n```r\ndfD \u003c- dfWide[cols_anova]\n#Define the design vector\ncond = as.factor(\n  c(\"his4\", \"his4\", \"his4\", \"met6\", \"met6\", \"met6\", \"ura2\", \"ura2\", \"ura2\")\n)\ndesign = model.matrix(~0+cond)\ncolnames(design) = gsub(\"cond\",\"\",colnames(design))\n#Make contrasts\nx \u003c- c(\n  \"his4-met6\", \"his4-ura2\", \"ura2-met6\" \n  )\ncontrast =  makeContrasts(contrasts=x,levels=design)\nfit1 \u003c- lmFit(dfD, design)\nfit2 \u003c- contrasts.fit(fit1,contrasts = contrast)\nfit3 \u003c- eBayes(fit2)\n#Extract PSM count information\npsm_count_table \u003c- dfD %\u003e%\n  merge(\n    all_proteins[c(\"Accession\", \"Number.of.PSMs\")],\n    by.x=\"row.names\", by.y=\"Accession\",  suffixes=c(\"\", \"_\"), sort=FALSE\n    )\nrow.names(psm_count_table) \u003c- psm_count_table$Row.names\npsm_count_table \u003c- psm_count_table[c(\"Number.of.PSMs\")]\nfit3$count = psm_count_table[rownames(fit3$coefficients),\"Number.of.PSMs\"]\nfit4 = spectraCounteBayes(fit3)\n```\n\nThe data-dependent variance distribution looks as follows:\n```r\nVarianceBoxplot(\n  fit4, n=30, main=\"TKO Variance according to DEqMS\", xlab=\"PSM count\"\n  )\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-10.png)\n\nLet's look at the contrast ura2-met6, column number 3:\n\n```r\ncurrent_contrast \u003c- colnames(fit4$coefficients)[[1]]\nres_DEqMS = outputResult(fit4, coef_col = 1)\nhead(res_DEqMS)\n```\n\n```\n##             logFC     AveExpr         t      P.Value    adj.P.Val        B   gene count      sca.t\n## P00815 -2.8532509 -0.54962113 -85.67738 8.428621e-13 1.318236e-09 18.82971 P00815    20 -108.89141\n## P05694  3.5999485 -0.85859918  77.41234 1.846004e-12 1.443575e-09 18.37417 P05694    30   96.37396\n## P37291 -0.6922910  0.40048261 -28.07419 4.585032e-09 1.434198e-06 11.77468 P37291    36  -42.07175\n## P15992 -0.6353142  0.44930312 -27.22447 5.803328e-09 1.512734e-06 11.53513 P15992    18  -36.62270\n## P39954 -0.5661252  0.01364542 -22.95737 2.138660e-08 3.344865e-06 10.18583 P39954    22  -31.96714\n## Q12443 -1.2662064  0.60896317 -44.48587 1.329146e-10 6.929282e-08 15.15264 Q12443     4  -30.06690\n##         sca.P.Value sca.adj.pval\n## P00815 8.998668e-18 1.407392e-14\n## P05694 3.360470e-17 2.627888e-14\n## P37291 2.531791e-13 1.319907e-10\n## P15992 1.121588e-12 4.385408e-10\n## P39954 4.810014e-12 1.504572e-09\n## Q12443 9.260675e-12 2.413949e-09\n```\nAdd negative log10-transfromed adjusted p-values for plotting:\n\n```r\nres_DEqMS$log.adj.P.Val  = -log10(res_DEqMS$adj.P.Val )\nres_DEqMS$log.sca.adj.pval = -log10(res_DEqMS$sca.adj.pval)\n```\nHow many proteins have adj. P \u003c= 0.05 according to LIMMA?\n```r\nmaxAdjP \u003c- 0.05\nminLog2FC \u003c- round(log2(1.3), 3)\ndim(\n  filter(res_DEqMS, adj.P.Val \u003c= maxAdjP)\n)\n```\n\n```\n## [1] 611  13\n```\nHow many proteins have adj. P \u003c= 0.05 and Log2 FC \u003e= log2(1.3) [30%] according to LIMMA?\n```r\ndim(res_DEqMS %\u003e%\n      filter(adj.P.Val \u003c= maxAdjP) %\u003e%\n      filter( logFC \u003e= minLog2FC | logFC \u003c= -1*minLog2FC ))\n```\n\n```\n## [1] 148  13\n```\nHow many proteins have adj. P \u003c= 0.05 according to DEqMS?\n```r\ndim(\n  filter(res_DEqMS, sca.adj.pval \u003c= maxAdjP)\n)\n```\n\n```\n## [1] 645  13\n```\nHow many proteins have adj. P \u003c= 0.05 and Log2 FC \u003e= log2(1.3) [30%] according to DEqMS?\n```r\ndim(res_DEqMS %\u003e%\n      filter(sca.adj.pval \u003c= maxAdjP) %\u003e%\n      filter( logFC \u003e= minLog2FC | logFC \u003c= -1*minLog2FC ))\n```\n\n```\n## [1] 154  13\n```\n\n```r\ngr1 \u003c- strsplit(current_contrast, \"-\")[[1]][[2]]\ngr2 \u003c- strsplit(current_contrast, \"-\")[[1]][[1]]\n#Add categorical annotation columns for LIMMA and DEqMS\nres_DEqMS$Diff_LIMMA \u003c- apply(\n  res_DEqMS, 1, function(x) {\n    local_p \u003c- as.numeric(x[[\"adj.P.Val\"]])\n    local_fc \u003c- as.numeric(x[[\"logFC\"]])\n    if (local_p \u003c= maxAdjP \u0026 local_fc \u003e= minLog2FC) {\n      return( paste(\"Up in\", gr2) )\n    } else if (local_p \u003c= maxAdjP \u0026 local_fc \u003c= -1*minLog2FC) {\n      return( paste(\"Up in\", gr1) )\n    } else {\n      return('Non-significant')\n    }\n  }\n)\nres_DEqMS$Diff_DEqMS \u003c- apply(\n  res_DEqMS, 1, function(x) {\n    local_p \u003c- as.numeric(x[[\"sca.adj.pval\"]])\n    local_fc \u003c- as.numeric(x[[\"logFC\"]])\n    if (local_p \u003c= maxAdjP \u0026 local_fc \u003e= minLog2FC) {\n      return( paste(\"Up in\", gr2) )\n    } else if (local_p \u003c= maxAdjP \u0026 local_fc \u003c= -1*minLog2FC) {\n      return( paste(\"Up in\", gr1) )\n    } else {\n      return('Non-significant')\n    }\n  }\n)\n```\n\nVolcano plot for LIMMA\n```r\nggplot(\n  res_DEqMS,\n  aes(x = logFC, y = log.adj.P.Val, colour = Diff_LIMMA )\n) +\n  geom_point(shape=19, size=2, alpha = 0.6)+\n  geom_hline(yintercept = -1*log10(maxAdjP), colour = \"gray65\") +\n  geom_vline(xintercept = 0, colour = \"gray65\") +\n  geom_vline(xintercept = -1*minLog2FC, colour = \"gray65\") +\n  geom_vline(xintercept = minLog2FC, colour = \"gray65\") +\n  ggtitle(\n    paste(\n      \"LIMMA \", current_contrast,\n      \" Adjusted P-value\u003c=\", maxAdjP, \" Log2 FC\u003e=\", minLog2FC,\n      sep=\"\"\n    )\n  ) +\n  theme_classic() +\n  theme(\n    legend.title = element_blank(), legend.text = element_text(size=12),\n    plot.title = element_text(size=16)\n  ) +\n  labs(x = paste(\"Log2 FC\",current_contrast), y = \"-Log10 Adj. P-value\" ) +\n  geom_text(\n    data = subset(res_DEqMS, logFC \u003e=1.0 | logFC \u003c= -0.8),\n    aes( logFC, log.adj.P.Val, label = Gene),\n    alpha = 0.6, hjust = 0.5, vjust = -0.6\n  )\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-11.png)\n\nVolcano plot for DEqMS\n```r\nggplot(\n  res_DEqMS,\n  aes(x = logFC, y = log.sca.adj.pval, colour = Diff_DEqMS )\n) +\n  geom_point(shape=19, size=2, alpha = 0.6)+\n  geom_hline(yintercept = -1*log10(maxAdjP), colour = \"gray65\") +\n  geom_vline(xintercept = 0, colour = \"gray65\") +\n  geom_vline(xintercept = -1*minLog2FC, colour = \"gray65\") +\n  geom_vline(xintercept = minLog2FC, colour = \"gray65\") +\n  ggtitle(\n    paste(\n      \"DEqMS \", current_contrast,\n      \" Adjusted P-value\u003c=\", maxAdjP, \" Log2 FC\u003e=\", minLog2FC,\n      sep=\"\"\n    )\n  ) +\n  theme_classic() +\n  theme(\n    legend.title = element_blank(), legend.text = element_text(size=12),\n    plot.title = element_text(size=16)\n  ) +\n  labs(x = paste(\"Log2 FC\",current_contrast), y = \"-Log10 Adj. P-value\" ) +\n  geom_text(\n    data = subset(res_DEqMS, logFC \u003e=0.9 | logFC \u003c= -0.8),\n    aes( logFC, log.sca.adj.pval, label = Gene),\n    alpha = 0.6, hjust = 0.5, vjust = -0.6\n  )\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-12.png)\n\n\nFurthermore, gene set enrichment using [fgsea package](http://bioconductor.org/packages/release/bioc/html/fgsea.html) can be employed for basic functional annotation.\u003cbr\u003e\nI downloaded the Gene Ontology annotations for yeast from [Uniprot](https://www.uniprot.org/proteomes/UP000002311), with gene names as identifiers. GO Consortium data is available under [Creative Commons Attribution 4.0 Unported License](https://creativecommons.org/licenses/by/4.0/legalcode). Let's use the GO Biological Process as an example, the gmt file can be found in this repository as well:\n\n```r\nannotations \u003c- gmtPathways(\"go-bp_gene-symbols_uniprot.gmt\")\n\n#Prepare the rank file for the ura2-met6 contrast\n#Replace the UNIPROT accession with the Gene name\ndfGSEA \u003c- res_DEqMS[c(\"Gene\", \"logFC\", \"log.sca.adj.pval\")]\n```\nUse the product (Log2FC * -Log10(adj.P.val.SCA)) to rank genes in the table:\n```r\ndfGSEA$rank \u003c- apply(\n  dfGSEA, 1, function(x) {\n    as.numeric(x[[2]]) * as.numeric(x[[3]])\n  }\n)\n\nranks \u003c- as.numeric(dfGSEA$rank)\nnames(ranks) \u003c- dfGSEA$Gene\nranks \u003c- sort(ranks)\nranks[1:10]\n```\n\n```\n##       HIS4       RTN2      SIP18      HSP12       GCV2       SHM2      HSP26       SAM1       SAH1       GPH1 \n## -39.522063 -10.911242  -8.174439  -7.404254  -6.980468  -6.839459  -5.945264  -5.255033  -4.994689  -4.869912\n```\n\nRun fgsea and go with the default settings for enrichment:\n```r\nres_gsea \u003c- fgseaMultilevel(annotations, ranks, minSize=15, maxSize=500)\nhead(res_gsea)\n```\n\n```\n##                                        pathway      pval      padj    log2err         ES        NES size\n## 1:                     DNA repair [GO:0006281] 0.5493827 0.9792398 0.07569463  0.4890924  0.9355784   20\n## 2:    actin filament organization [GO:0007015] 0.9277344 0.9939024 0.04802204 -0.3478877 -0.6806408   21\n## 3:            aerobic respiration [GO:0009060] 0.4152893 0.9675056 0.09110731  0.5546527  1.0388591   18\n## 4: carbohydrate metabolic process [GO:0005975] 0.7070312 0.9792398 0.06064040 -0.4277279 -0.8368478   21\n## 5:                     cell cycle [GO:0007049] 0.8966942 0.9939024 0.05216303  0.3572035  0.6690387   18\n## 6:                  cell division [GO:0051301] 0.2551440 0.9181034 0.12154328  0.6158352  1.1780231   20\n##                            leadingEdge\n## 1: HRT1,BDF1,HTA2,RFA2,MGM101,MLP1,...\n## 2:                      ENT2,SAC6,RHO1\n## 3:     RIB1,MIX14,RIB3,QCR7,MIX17,RIP1\n## 4:   PGM2,CIT1,GPD1,GLC3,TAL1,MDH1,...\n## 5:                     CDC25,BNI5,NSP1\n## 6:                     HRT1,CDC25,UBC9\n```\n\nTen pathways with the largest positive enrichment scores:\n```r\nres_gsea[ES \u003e 0][head(order(pval), n=10),]\n```\n\n```\n##                                                                                                   pathway       pval\n##  1: maturation of LSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA) [GO:0000463] 0.05785124\n##  2:                                                                     chromatin remodeling [GO:0006338] 0.06918239\n##  3:                                                       ribosomal large subunit biogenesis [GO:0042273] 0.07171315\n##  4:                                                       fungal-type cell wall organization [GO:0031505] 0.07370518\n##  5:                                                       transcription by RNA polymerase II [GO:0006366] 0.08176101\n##  6:                                negative regulation of transcription by RNA polymerase II [GO:0000122] 0.10062893\n##  7:                                         regulation of transcription by RNA polymerase II [GO:0006357] 0.17283951\n##  8:                                                                      ribosome biogenesis [GO:0042254] 0.19591837\n##  9:                                positive regulation of transcription by RNA polymerase II [GO:0045944] 0.24696356\n## 10:                                                                            cell division [GO:0051301] 0.25514403\n##          padj   log2err        ES      NES size                            leadingEdge\n##  1: 0.4832041 0.2712886 0.7913647 1.482218   18              NOP2,NOP12,CIC1,TIF6,RPF2\n##  2: 0.4832041 0.2489111 0.7765741 1.440644   17     CHZ1,CBF1,HTZ1,TAF14,BDF1,CYC8,...\n##  3: 0.4832041 0.2377938 0.7208805 1.497722   30 NOP2,RPL26A,RPL26B,TIF6,MAK21,REH1,...\n##  4: 0.4832041 0.2343926 0.6997807 1.462006   31      CIS3,ZEO1,BAR1,BGL2,EXG1,CRH1,...\n##  5: 0.4832041 0.2279872 0.7579933 1.400889   16    RPC10,RPB9,TAF14,BUR6,TFA1,RPB3,...\n##  6: 0.5132075 0.2042948 0.7395614 1.366824   16     PAF1,RPA12,CBF1,BUR6,ESS1,RAP1,...\n##  7: 0.8013468 0.1511488 0.6526132 1.248375   20      PAF1,CBF1,SPN1,HTZ1,TUP1,CYC8,...\n##  8: 0.8326531 0.1404062 0.6386159 1.241881   21         RPC10,RPA14,RPA12,RPL40A,RPC19\n##  9: 0.9181034 0.1226792 0.5842056 1.187375   26      KAE1,CBF1,BUR6,ESS1,RAP1,TUP1,...\n## 10: 0.9181034 0.1215433 0.6158352 1.178023   20                        HRT1,CDC25,UBC9\n```\n\nTen pathways with the largest negative enrichment scores:\n```r\nres_gsea[ES \u003c 0][head(order(pval), n=10),]\n```\n\n```\n##                                                                                pathway        pval      padj    log2err\n##  1:                                purine nucleotide biosynthetic process [GO:0006164] 0.006649502 0.3391246 0.40701792\n##  2:                                 cellular response to oxidative stress [GO:0034599] 0.040935673 0.4832041 0.31532483\n##  3:                                       ergosterol biosynthetic process [GO:0006696] 0.041904762 0.4832041 0.30775005\n##  4:                                                    glycolytic process [GO:0006096] 0.085271318 0.4832041 0.21392786\n##  5: retrograde vesicle-mediated transport, Golgi to endoplasmic reticulum [GO:0006890] 0.535019455 0.9792398 0.07417590\n##  6:                                                           endocytosis [GO:0006897] 0.543859649 0.9792398 0.07343814\n##  7:                                                           translation [GO:0006412] 0.609523810 0.9792398 0.06658921\n##  8:                                      ribosomal large subunit assembly [GO:0000027] 0.705426357 0.9792398 0.06037864\n##  9:                                        carbohydrate metabolic process [GO:0005975] 0.707031250 0.9792398 0.06064040\n## 10:                                                       rRNA processing [GO:0006364] 0.727626459 0.9792398 0.05909548\n##             ES        NES size                        leadingEdge\n##  1: -0.8517795 -1.5621756   15 MTD1,ADE17,ADE4,ADE2,ADE6,ADE8,...\n##  2: -0.6998521 -1.5312424   39 HSP12,ACT1,TPS1,GAD1,YHB1,MCR1,...\n##  3: -0.7858271 -1.4633302   16        ERG27,ERG13,MCR1,ERG20,ERG9\n##  4: -0.7048514 -1.3727129   20 HXK1,GLK1,CDC19,TDH1,PFK2,PGI1,...\n##  5: -0.5006317 -0.9619512   19    RER1,YPT1,SEC21,COP1,GET3,ERV46\n##  6: -0.4312358 -0.9435230   39                ACT1,CHC1,ENT2,DNF1\n##  7: -0.4857894 -0.9046142   16                             RPL24B\n##  8: -0.4277358 -0.8330244   20                        RPL24B,RPL3\n##  9: -0.4277279 -0.8368478   21  PGM2,CIT1,GPD1,GLC3,TAL1,MDH1,...\n```\nLet's look at one of the pathways from the top with positive enrichment scores:\n```r\nplotEnrichment(\n  annotations[[\"ribosomal large subunit biogenesis [GO:0042273]\"]],\n  ranks\n) + labs(title=\"ribosomal large subunit biogenesis [GO:0042273]\")\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-13.png)\n\n```r\nannotations[[\"ribosomal large subunit biogenesis [GO:0042273]\"]]\n```\n\n```\n##  [1] \"ACL4\"    \"AFG2\"    \"ALB1\"    \"EBP2\"    \"ERB1\"    \"HAS1\"    \"HRR25\"   \"JIP5\"    \"JJJ1\"    \"LOC1\"    \"MAK11\"   \"MAK16\"  \n## [13] \"MAK21\"   \"MAK5\"    \"MRT4\"    \"NIP7\"    \"NOC2\"    \"NOC3\"    \"NOG1\"    \"NOP15\"   \"NOP16\"   \"NOP2\"    \"NOP4\"    \"NOP7\"   \n## [25] \"NOP8\"    \"NSA1\"    \"NSA2\"    \"PRP43\"   \"PUF6\"    \"REH1\"    \"REI1\"    \"RIX7\"    \"RLI1\"    \"RLP24\"   \"RLP7\"    \"RPL14A\" \n## [37] \"RPL14B\"  \"RPL26A\"  \"RPL26B\"  \"RPL33A\"  \"RPL33B\"  \"RPL7A\"   \"RPL7B\"   \"RRP14\"   \"RRP8\"    \"RRS1\"    \"SDA1\"    \"SQT1\"   \n## [49] \"SYO1\"    \"TIF4631\" \"TIF6\"    \"TRM112\"  \"YTM1\"\n```\n\nAnd at one of the pathways from the top with negative enrichment scores:\n```r\nplotEnrichment(\n  annotations[[\"purine nucleotide biosynthetic process [GO:0006164]\"]],\n  ranks\n) + labs(title=\"purine nucleotide biosynthetic process [GO:0006164]\")\n```\n\n![plot of chunk unnamed-chunk-1](figure/unnamed-chunk-1-14.png)\n\n```r\nannotations[[\"purine nucleotide biosynthetic process [GO:0006164]\"]]\n```\n\n```\n##  [1] \"ADE1\"   \"ADE12\"  \"ADE13\"  \"ADE16\"  \"ADE17\"  \"ADE2\"   \"ADE3\"   \"ADE4\"   \"ADE5,7\" \"ADE6\"   \"ADE8\"   \"MIS1\"   \"MTD1\"  \n## [14] \"PRS1\"   \"PRS2\"   \"PRS3\"   \"PRS4\"   \"PRS5\"\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdev-ev%2Fisobaric-prot-wf-r","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdev-ev%2Fisobaric-prot-wf-r","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdev-ev%2Fisobaric-prot-wf-r/lists"}