{"id":18880213,"url":"https://github.com/genepattern/dapple","last_synced_at":"2025-04-14T19:32:20.649Z","repository":{"id":42985401,"uuid":"141158575","full_name":"genepattern/DAPPLE","owner":"genepattern","description":"Disease Association Protein-Protein Link Evaluator (Broad hosted algorithm)","archived":false,"fork":false,"pushed_at":"2024-01-03T17:32:12.000Z","size":40625,"stargazers_count":5,"open_issues_count":0,"forks_count":3,"subscribers_count":6,"default_branch":"develop","last_synced_at":"2025-03-28T08:21:48.022Z","etag":null,"topics":["community","genepattern-module","gparc","user-contri"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/genepattern.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-16T15:34:39.000Z","updated_at":"2024-04-14T06:13:58.000Z","dependencies_parsed_at":"2024-11-08T06:42:45.834Z","dependency_job_id":"4c509c27-a9f6-42a3-9136-6ae3fe6fd3b3","html_url":"https://github.com/genepattern/DAPPLE","commit_stats":{"total_commits":56,"total_committers":4,"mean_commits":14.0,"dds":0.5535714285714286,"last_synced_commit":"422352602308324749f5c586862928bb25959e0b"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genepattern%2FDAPPLE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genepattern%2FDAPPLE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genepattern%2FDAPPLE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/genepattern%2FDAPPLE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/genepattern","download_url":"https://codeload.github.com/genepattern/DAPPLE/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248946049,"owners_count":21187437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["community","genepattern-module","gparc","user-contri"],"created_at":"2024-11-08T06:42:32.459Z","updated_at":"2025-04-14T19:32:15.632Z","avatar_url":"https://github.com/genepattern.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DAPPLE\n## Background\nDAPPLE stands for Disease Association Protein-Protein Link Evaluator. DAPPLE looks\nfor significant physical connectivity among proteins encoded for by genes in loci\nassociated to disease according to protein-protein interactions reported in the literature.\nThe hypothesis behind DAPPLE is that causal genetic variation affects a limited set of\nunderlying mechanisms that are detectable by protein-protein interactions. Please refer to\nthe DAPPLE publication for full details.\nDAPPLE takes as input a list of genes, SNPs or genomic regions. See section II for a\ndetailed description. It will build direct and indirect interaction networks from proteins\nencoded for by seed genes. It will then assess the statistical significance of a number of\nnetwork connectivity parameters as well as of the connectivity of individual proteins to\nother seed proteins using a within-degree node-label permutation method. The individual\nprotein scores are then used to propose candidate genes in large loci. **Please note that the\nmost recent release uses adaptive permutation whereby networks that do not\nachieve P \u003c 0.1 (for any parameter) are stopped at 100 permutations.**\n## Inputs\nDAPPLE takes 4 types of input:\n- A list of **genes**. These are entered as one entry per line, either in a specified file or\ndirectly via the webpage interface. Each gene should be identified with its gene\nsymbol (ie Hugo) ID, such as \"ATXN1\". This mode should be used if the user\ndoes not want to group genes into regions, but rather wants each gene to stand as\nits own region.\n- A list of **SNPs**. These are entered as one SNP per line, either in a specified file or\ndirectly via the webpage interface. Thse SNPs must be in HapMap or 1KG,\nbecause this is how DAPPLE defines the 'wingspan' region around a gene which\nis a function of linkage disequilibrium.\n- A list of **regions**. These are entered as one region per line. Each region should be\nentered as 'ID chr left right' where ID is a region identifier, chr is a number from\n1-23, left is the left boundary in genomic coordinates and right is the right\nboundary. The entries can be space or tab delimited.\n- A list of **genes with region identifiers**, or \"gene-regions\". These are entered as\none entry per line, either in a specified file or directly via the webpage interface.\nEach entry should be defined as 'gene ID' where gene is a gene name in gene\nsymbol (ie Hugo) ID, such as \"ATXN1\", and ID refers to a region to assign the\ngene to. Since DAPPLE is specifically looking for connectivity between regions -\nand not within regions - the user can group genes based on how they want to\ndefine groups.\n\nHere are some examples:  \n\n*Gene input*\n```PANK4\nHES5\nTNFRSF14\nMMEL1\nPADI4\nPTPN22\n```\n*SNP input*\n```rs3890745\nrs2240340\nrs2476601\nrs11586238\nrs7528684\nrs12746613\n```\n*Region input*\n```rs3890745 1 2395699 2744704\nrs2240340 1 17471282 17551282\nrs2476601 1 113874482 114254482\nrs11586238 1 117057482 117097482\nrs7528684 1 155807552 156083552\nrs12746613 1 159656552 159750570\n```\n*Gene-region input*\n```PANK4 rs3890745\nHES5 rs3890745\nTNFRSF14 rs3890745\nMMEL1 rs3890745\nPADI4 rs2240340\nPTPN22 rs2476601\n```\n*Combination input*\n```rs3087243\nrs6822844\nregion1 1 159656552 159750570\nregion2 1 196861967 197040967\nSLC26A10\nPIP4K2C\n```\n## Output Files\nDAPPLE outputs a number of files, all of which are described here. “FILE” refers to the\nkeyword input by the user.\n\n**FILE_summary**: This file contains the parameter values for the 4 network statistics\nmeasured: (1) The number of direct connections between seen proteins from different\nloci, (2) the average seed protein direct connectivity (a.k.a. direct binding degree), (3) the\naverage seed protein indirect connectivity (a.k.a. indirect binding degree) and (4) the\naverage common indiractor binding degree (the average number of seed proteins that\ncommon interactors bind to).\n\n**FILE_NetStats**: This file contains the permutation p-values for the 4 network statistics\ndescribed in FILE_summary (i.e., what is the probability that I see a parameter value \u003e=\nthe observed value by chance?)\n\n**FILE_SeedScores**: This file contains the individual p-values for seed proteins -\ngenerally, the probability that by chance the seed protein would be as connected to other\nseed proteins (directly or indirectly) as is observed. Please refer to the [publication's\nsupplementary materials](https://www.ncbi.nlm.nih.gov/pubmed/21249183) for exact details of p-value calculation. The file contains 4\ncolumns: gene ID, region ID, uncorrected p-value, corrected p-value.\n\n**FILE_GenesToPrioritize**: This file contains genes that achieved a corrected p-value less\nthan 0.05.\n\n**FILE_CIscores**: This file contains the p-values for common interactors that describe the\nprobability that by chance individual common interactors would be as connected to seed\nproteins as was observed.\n\n**FILE_directConnections**: This file contains a list of the direct connections in the\nnetwork.\n\n**FILE_plot**: If the user chose plot=true, this is the visualization of the network. Page 1\nshows the direct network and pages 2-3 show the indirect network. Colors of seeds\ncorrespond to region.\n\n**FILE_MissingGenes**: This file is important to pay attention to. If the input is SNPs or\nregions, this describes the genes in those input regions that are in the InWeb database in\ncontrast to those that aren't. If too many input proteins are not in the InWeb database (less\nthan 60% average inclusion), one should be careful about interpreting DAPPLE results.\n\n**FILE_permuted\\***: Values of permuted parameters\n\n## FAQ\n**What is DAPPLE testing?** The hypothesis behind DAPPLE is that causal genetic\nvariants affect common mechanisms and that these mechanisms can be inferred by\nlooking for physical connections between proteins encoded in disease-associated regions.\nDAPPLE is therefore testing whether the networks built from seed regions - both direct\nnetworks and indirect networks - are more connected than chance expectation. Chance\nexpectation is defined by the connectivity expected if connectivity were purely a function\nof the binding degree of participating proteins.\n\n**If I input a SNP, how is the region defined around that SNP?** The region is defined\nusing LD according to the HapMap. For a given SNP, we extend out to the region defined\nby SNPs in r^2\u003e=0.5 and then extend out to the nearest hotspots.\n\n**For a region, how are overlapping genes defined?** The hg18 gene list was downloaded\nfrom UCSC using Ensemble transcripts. Splice isoforms were then collapsed to define\nthe largest gene footprint from transcription start to transcription stop. Gene footprints\nwere then extended on either end to include 50kb of regulatory sequence by default,\nthough the user can specify a different regulatory region. Any gene footprint that overlaps\na region is included in that region. If a gene overlaps 2 regions, those regions are merged.\nIf the user would like to keep the regions seperate, they should input genes and explicitly\nassign them to regions (option #3 on \"What type of input does DAPPLE take?\").\n\n**Where does the protein-protein interaction data come from?** We use the InWeb3\ndatabased, published by Kasper Lage in 2007 \\([1](https://www.ncbi.nlm.nih.gov/pubmed/17344885),[2](https://www.ncbi.nlm.nih.gov/pubmed/27892958)\\). This database contains 428,430 reported\ninteractions, 169,810 of which are deemed high-confidence, non-self interactions across\n12,793 proteins. High-confidence is defined by a rigorously tested signal to noise\nthreshold as determined by comparison to well-established interactions. Briefly, InWeb\ncombines reported protein interactions from MINT, BIND, IntAct, KEGG annotated\nprotein-protein interactions (PPrel), KEGG Enzymes involved in neighboring steps\n(ECrel), Reactome and others as described elsewhere in detail. All human interactions\nwere pooled and interactions in orthologous protein pairs passing a strict threshold for\northology were included. Each interaction was assigned a probabilistic score based on the\nneighborhood of the interaction, the scale of the experiment in which the interaction was\nreported and the number of different publications in which the interaction had been sited.\n## References\nRossin EJ, Lage K, Raychaudhuri S, Xavier RJ, Tartar D, IIBDGC, Cotsapas C, Daly MJ.\n2011 Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease\nPhysically Interact and Suggest Underlying Biology. PLoS Genetics 7(1): e1001273\n\nLundby, A. \u0026 Rossin EJ et al. Annotation of loci from genome-wide association studies\nusing tissue-specific quantitative interaction proteomics. Nat. Methods 11, 868–874\n(2014).\n## Contact\nPlease email dapple@broadinstitute.org with any questions.  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenepattern%2Fdapple","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgenepattern%2Fdapple","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenepattern%2Fdapple/lists"}