{"id":35762854,"url":"https://github.com/bionetslab/digest-py","last_synced_at":"2026-01-07T00:00:23.851Z","repository":{"id":53149603,"uuid":"462419598","full_name":"bionetslab/digest-py","owner":"bionetslab","description":"Python package for DIGEST","archived":false,"fork":false,"pushed_at":"2023-07-27T06:59:05.000Z","size":678,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-07-27T07:55:14.213Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bionetslab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-22T18:12:32.000Z","updated_at":"2023-07-27T07:55:14.213Z","dependencies_parsed_at":"2023-01-19T06:45:15.094Z","dependency_job_id":null,"html_url":"https://github.com/bionetslab/digest-py","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/bionetslab/digest-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionetslab%2Fdigest-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionetslab%2Fdigest-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionetslab%2Fdigest-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionetslab%2Fdigest-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bionetslab","download_url":"https://codeload.github.com/bionetslab/digest-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bionetslab%2Fdigest-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28230229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2026-01-06T02:00:07.049Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-07T00:00:07.392Z","updated_at":"2026-01-07T00:00:23.844Z","avatar_url":"https://github.com/bionetslab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"DIGEST Logo\" src=\"https://github.com/bionetslab/digest/blob/main/digest_logo.png?raw=true\" width=\"500\" /\u003e\n\u003c/p\u003e\n\n# biodigest\nThe python package for [DIGEST](https://digest-validation.net/) (validation of **di**sease and **ge**ne **s**ets, clus**t**erings or subnetworks). It greatly facilitates in silico validation of gene and disease sets, clusterings or subnetworks via fully automated validation pipelines comprising disease and gene ID mapping, enrichment\nanalysis, comparisons of shared genes and variants, and background distribution estimation. Moreover, functionality is provided to automatically update the external databases used by the pipelines.\n\nA light version excluding the subnetwork option end therefore reducing the needed\nrequirements for installing more complex python packages can be found as [biodigest-light](https://pypi.org/project/biodigest-light/).\n\n[Source code](https://github.com/bionetslab/digest)\n\n\n## Setup for proper usage\nAfter installing biodigest, you need to install [graph-tools package](https://git.skewed.de/count0/graph-tool/-/wikis/installation-instructions).\n\n```python\nimport biodigest\n```\n\nBefore you can run the validation, you need to download precalculated mappings and distance matrices. These can be downloaded in two ways:\n### 1. [Recommended] Get data from api\nThe API keeps all data up to date and checks for updates at regular intervals. This process takes 1-5 minutes depending on the internet connection.\n```python\nfrom biodigest import setup\nsetup.main(setup_type=\"api\")\n```\n### 2. Create data from scratch\nAny mappings are freshly fetched from the databases and the distance matrices are calculated. Be aware that this can take up to 3 hours. \n```python\nfrom biodigest import setup\nsetup.main(setup_type=\"create\")\n```\n\n## Run validation\n```python\nfrom biodigest.single_validation import single_validation\nresults = single_validation(tar: Union[pd.DataFrame, set], tar_id: str, mode: str, distance: str = \"jaccard\",\n                            ref: Union[str, set] = None, ref_id: str = None, enriched: bool = False,\n                            network_data: dict = None, mapper: Mapper = FileMapper(), runs: int = config.NUMBER_OF_RANDOM_RUNS,\n                            background_model: str = \"complete\", replace=100, verbose: bool = False)\n```\nAll results that can later be saved and visualize are saved in `results` as data type `dict()`.\n### Parameters\n#### Required parameters\n- **tar**: Target input you want to be validated\n  - a cluster should be of type `pd.DataFrame()` with `columns=[\"id\",\"cluster\"]`\n  - a set should be of type `set()`\n- **tar_id**: Is the id type of the target (see possible options below)\n- **ref**: Reference, to which **tar** will be compared (Only for mode id-set and set-set) \n  - a single id should be of type `str`\n  - a set should be of type `set()`\n- **ref_id**: Is the id type of the reference (see possible options below)\n- **mode**: Desired mode. See possible options below.\n#### Optional parameters\n- **distance**: Distance measure used for pairwise comparison\n- **enriched**: Set `True`, if only enriched attributes of the reference set should be used (Only for set-set)\n- **network_data**: Only for \"subnetwork\" and \"subnetwork-set\" mode. Dictionary consisting of {\"network_file\": path to network file,\n    \"prop_name\": name of vertex property with ids if network file of type graphml or gt,\n    \"id_type\": id type of network ids}\n- **background_model**: Model defining how random values should be picked. See possible options below.\n- **mapper**: Mapper object indicating where all files from the setup are saved. `[Default=FileMapper()]`\n- **runs**: Number of runs with random target values for p-value calculation\n- **replace**: Percentage of how many of the original ids should be replaced with random ids\n- **verbose**: get additional information during the run\n#### Supported types\n- **gene types**: entrez, ensembl, symbol, uniprot\n- **disease types**: mondo, omim, snomedct, umls, orpha, mesh, doid, ICD-10\n#### Modes\n- **set**: Compare similarity inside the set using the mean of all pairwise comparisons\n- **id-set**: Compare target set to reference set\n- **set-set**: Compare target set to reference id\n- **clustering**: Compare cluster quality inside clustering. Either genes or diseases\n- **subnetwork**: Compare similarity inside the subnetwork nodes. Either genes or diseases\n- **subnetwork-set**: Compare target subnetwork to reference set. Both either genes or diseases\n#### Background models\n- **complete**: Random ids will be picked completely randomly\n- **term-pres**: Random ids will preserve the number of mapped terms for the replaced ids\n- **network**: Random ids will preserve the number of connected components in given network.\n### Result\nThe method call returns the result in a json format of datatype dict which consists of \nthe following elements:\n```python\nresult = {'status': 'Status text',\n          'input_values': {'values': dict(), 'mapped_ids': list()}, \n          'random_values': {'values': dict()},\n          'p_values': {'values': dict()}}\n```\n- **status**: contains either an error message if a mapping failed or \"ok\" if IDs could be mapped\n- **input_values**:\n  - **values**: table in dict format with the functional or genetic relevance score(s) determined for solely their input\n  - **mapped_ids**: list containing the IDs with non empty annotations per functional or genetic annotation type\n- **random_values**:\n  - **values**: table in dict format with the functional or genetic relevance score(s) determined for all random runs\n- **p_values**: table in dict format with the calculated empirical P-values using the selected background model and other parameters that indicate the significance of the calculated relevance scores derived from the input\n## Save and visualize results\n```python\nfrom biodigest.single_validation import save_results\nfrom biodigest.evaluation.d_utils.plotting_utils import create_plots, create_extended_plots\n\n# Save results into json file and 2 .csv table files\nsave_results(results: dict, prefix: str, out_dir)\n\n# Generate and save plots based on results\n# Consisting of p-value plot and mappability plot\ncreate_plots(results, mode, tar, tar_id, out_dir, prefix, file_type: str = \"pdf\")\n# Generate and save extended plots based on results\n# Consisting of distribution and sankey plots\ncreate_extended_plots(results, mode, tar, out_dir, prefix, file_type: str = \"pdf\", mapper:Mapper=FileMapper())\n```\n### Parameters\n#### Required parameters\n- **results**: Is the output created with method `single_validation` as data type `dict()`\n- **prefix**: Prefix for file names\n- **out_dir**: Output directory for the generated files\n#### Additional required parameters for create_plots\n- **tar**: Target input you want to be validated\n  - a cluster should be of type `pd.DataFrame()` with `columns=[\"id\",\"cluster\"]`\n  - a set should be of type `set()`\n- **tar_id**: Is the id type of the target (see possible options above)\n- **mode**: Desired mode. See possible options above.\n#### Optional parameters for create_plots\n- **file_type**: Type of the plots image files.\n## Run significance contribution calculation\nIf you are interested in how the single ids from the target set are contribution to the \nfinal calculated empirical P-values, you can run the significance contribution \ncalculations. Keep in mind the runtime will increase in a linear way based\non the number of ids in the input target set. \n```python\nfrom biodigest.single_validation import significance_contributions\nresults_sig = significance_contributions(results: dict, \n                                         tar: Union[pd.DataFrame, set], tar_id: str, mode: str, distance: str = \"jaccard\",\n                                         ref: Union[str, set] = None, ref_id: str = None, enriched: bool = False,\n                                         mapper: Mapper = FileMapper(), runs: int = config.NUMBER_OF_RANDOM_RUNS,\n                                         background_model: str = \"complete\", replace=100, verbose: bool = False)\n```\n### Parameters\n- **results**: This is the output generated from `single_validation` on the full input set.\nThis will be used to calculate the significance contribution of the single ids from the input set.\n- **all other paramters:** look up parameters under \"Run validation\"\n## Save and visualize results\n```python\nfrom biodigest.single_validation import save_contribution_results\nfrom biodigest.evaluation.d_utils.plotting_utils import create_contribution_plots, create_contribution_graphs\n\n# Save results into json file and .csv table files for each validation type\nsave_contribution_results(results: dict, prefix: str, out_dir)\n\n# Generate and save plots based on results\n# Consisting an overview heatmap of the top 15 ids \n# with the largest absolute overall significance contribution\n# and top 10 largest positive and negative \n# significance contribution per annotation type\ncreate_contribution_plots(result_sig, out_dir, prefix, file_type: str = \"pdf\")\n# Generate a graph visualization from subnetwork constructed\n# by the input set and the given (or default) network\n# with nodes colored by their significance contribution\ncreate_contribution_graphs(result_sig, tar_id, network_data, out_dir, prefix,\n                           file_type: str = \"pdf\", mapper: Mapper = FileMapper())\n```\n### Parameters\n#### Required parameters\n- **results**: Is the output created with method `significance_contributions` as data type `dict()`\n- **prefix**: Prefix for file names\n- **out_dir**: Output directory for the generated files\n#### Additional required parameters\n- **input_type**: \"genes\" or \"diseases\" based on id type of target set\n- **network_data**: Only for \"subnetwork\" and \"subnetwork-set\" mode. Dictionary consisting of {\"network_file\": path to network file,\n    \"prop_name\": name of vertex property with ids if network file of type graphml or gt,\n    \"id_type\": id type of network ids}\n- **mapper**: Mapper object indicating where all files from the setup are saved. `[Default=FileMapper()]`\n#### Optional parameters for create_plots\n- **file_type**: Type of the plots image files.\n## Example runs\nCheck out the [tutorial](https://github.com/bionetslab/digest-tutorial) to see examples of usage in a script.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbionetslab%2Fdigest-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbionetslab%2Fdigest-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbionetslab%2Fdigest-py/lists"}