{"id":18501592,"url":"https://github.com/const-ae/prodd_old","last_synced_at":"2025-06-13T06:06:25.838Z","repository":{"id":82391892,"uuid":"111397320","full_name":"const-ae/proDD_old","owner":"const-ae","description":"Differential Detection for Label-free (LFQ) Mass Spec Data","archived":false,"fork":false,"pushed_at":"2017-11-20T11:23:25.000Z","size":7418,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-13T06:02:02.424Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/const-ae.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-20T10:44:17.000Z","updated_at":"2018-11-07T17:14:24.000Z","dependencies_parsed_at":"2023-04-19T23:32:04.732Z","dependency_job_id":null,"html_url":"https://github.com/const-ae/proDD_old","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/const-ae/proDD_old","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2FproDD_old","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2FproDD_old/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2FproDD_old/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2FproDD_old/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/const-ae","download_url":"https://codeload.github.com/const-ae/proDD_old/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2FproDD_old/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259592260,"owners_count":22881265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T13:54:21.918Z","updated_at":"2025-06-13T06:06:25.790Z","avatar_url":"https://github.com/const-ae.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"proDD\"\noutput: github_document\n---\n\n```{r setup, include=FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  cache = TRUE,\n  fig.path = \"tools/README-fig/\",\n  cache.path = \"tools/README-cache/\",\n  message = FALSE,\n  warning = FALSE\n)\n```\n\nDifferential Detection with Label-free Mass Spec Data\n\n## Overview\n\nThis package provides a framework to find proteins in mass spec data that are differentially detected between groups.\nIt is designed to deal with high number of missing values (i.e. zeros) and can nonetheless give reliable significance\nestimates.\n\nIt is thus applicable to data from affinity purification experiments such as BioID.\n\n## Method\n\nThe algorithm is build around the fact that a missing values are more likely to occur if the intensity of protein\nis low, which means that a missing observation can tell us something. In the first step the algorithm quantifies this dependency by\nestimating a logistic regression of the chance to miss a value depending on the underlying intensity. This model\nis fitted using Hamiltonian Monte Carlo method, because precise estimates of the sigmoid are necessary for reliable\ndownstream calculations. In the second step the group means for each condition and protein are estimated using a\nmaximum likelihood approach. To find which groups are actually significantly expressed in the last step a moderated\nt-test is applied to each protein.\n\nUnlike other approaches that have been suggested in the literature that rely on imputing missing values using _ad hoc_\nmethods, such as just using half the global minimum, proDD exploits the information provided by the zeros in a \nstructured way and focuses on the MLE of the group means, which are sufficient to establish significance.\n\n## Workflow\n\nInstallation\n\n```{r}\n# Install directly from github\ndevtools::github(\"const-ae/proDD\")\n```\n\n\nLet's assume that `X` is a matrix where each row contains the intensity for one protein and each column is one \nsample, which can be grouped into conditions.\n\n```{r, echo=FALSE}\nlibrary(proDD)\nsource(\"tests/testthat/helper_datageneration.R\")\ndata \u003c- generate_zero_inflated_data_with_effect(N_genes=100, N_rep=3, perc_changed = 0, mu0=8.5, nu0=5, sigma0=0.4, location=8, scale=-0.3)\nX \u003c- cbind(data$X, data$Y)\ncolnames(X) \u003c- c(paste0(\"A_\", 1:3), paste0(\"B_\", 1:3))\nX \u003c- X[rowSums(X) != 0, ]\n```\n\n```{r}\nlibrary(proDD)\nhead(X, n=10)\n```\n\n```{r, echo=FALSE}\nComplexHeatmap::Heatmap((X != 0)*1.0, cluster_rows=FALSE, cluster_columns= FALSE,\n                        col=c(\"black\", \"lightgrey\"), name=\"Value Observed\")\n```\n\n\nFor subsequent steps a description of the samples is necessary, i.e. which sample belongs to which condition.\nFor this we will create a dataframe containing that information:\n\n```{r}\ndata_description \u003c-  data.frame(Condition=as.factor(c(rep(\"A\", 3), rep(\"B\", 3))), \n                                Replicate=c(1:3, 1:3))\ndata_description$Sample \u003c- paste0(data_description$Condition, data_description$Replicate)\ndata_description\n\ndesign \u003c-  model.matrix(Sample ~ Condition - 1, data_description)\ndesign\n```\n\nNow we can apply the algorithm that consists of three steps to that data\n\n1. Estimate the parameters for the variance moderation:\n\n    ```{r}\n    vm_est \u003c- estimate_variance_moderation(X, design)\n    ```\n\n2. Estimate the sigmoid that describes the chance to miss an observation:\n\n    ```{r}\n    sig_est \u003c- estimate_sigmoid(X, data_description, vm_est$nu_est, vm_est$sigma2_est, chains=1)\n    ```\n\n3. Estimate the means of each condition per protein\n\n    ```{r}\n    group_locations \u003c- estimate_group_means(X, design, vm_est$nu_est, vm_est$sigma2_est, sig_est$location_est, sig_est$scale_est)\n    ```\n\n4. Lastly, apply the moderated t-test to the group means to find differentially detected proteins:\n\n    ```{r}\n    result \u003c- detect_differences(X, design, data_description, d0=vm_est$nu_est, s0=vm_est$sigma2_est,\n                                 group_locations=group_locations, comparison=c(\"A\", \"B\"))\n    \n    head(result, n=10)\n    ```\n\n\n\n## Note\n\nThis project is still work in progress and although the algorithm is working well, the API will probably change dramatically.\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Fprodd_old","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconst-ae%2Fprodd_old","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Fprodd_old/lists"}