{"id":34112898,"url":"https://github.com/churchmanlab/genewalk","last_synced_at":"2026-04-06T02:02:16.631Z","repository":{"id":51785650,"uuid":"167066459","full_name":"churchmanlab/genewalk","owner":"churchmanlab","description":"GeneWalk identifies relevant gene functions for a biological context using network representation learning","archived":false,"fork":false,"pushed_at":"2024-08-01T23:56:11.000Z","size":400,"stargazers_count":133,"open_issues_count":0,"forks_count":17,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-12-17T02:52:59.136Z","etag":null,"topics":["functional-genomics","machine-learning-algorithm"],"latest_commit_sha":null,"homepage":"https://churchman.med.harvard.edu/genewalk","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/churchmanlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-22T20:59:55.000Z","updated_at":"2025-11-26T21:08:15.000Z","dependencies_parsed_at":"2022-08-28T03:02:41.533Z","dependency_job_id":"b680ffe2-b99d-4449-8e6b-23da9bce43e0","html_url":"https://github.com/churchmanlab/genewalk","commit_stats":{"total_commits":473,"total_committers":4,"mean_commits":118.25,"dds":0.4820295983086681,"last_synced_commit":"af52c055e5408ddbc35428db09c41dbcedda9776"},"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/churchmanlab/genewalk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/churchmanlab%2Fgenewalk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/churchmanlab%2Fgenewalk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/churchmanlab%2Fgenewalk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/churchmanlab%2Fgenewalk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/churchmanlab","download_url":"https://codeload.github.com/churchmanlab/genewalk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/churchmanlab%2Fgenewalk/sbom","scorecard":{"id":282208,"data":{"date":"2025-08-11","repo":{"name":"github.com/churchmanlab/genewalk","commit":"f9ec5cda8d35dfc9787cf5be840a42771f9ef9bd"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.5,"checks":[{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":2,"reason":"Found 4/16 approved changesets -- score normalized to 2","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/tests.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:10: update your workflow using https://app.stepsecurity.io/secureworkflow/churchmanlab/genewalk/tests.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:11: update your workflow using https://app.stepsecurity.io/secureworkflow/churchmanlab/genewalk/tests.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/tests.yml:18: update your workflow using https://app.stepsecurity.io/secureworkflow/churchmanlab/genewalk/tests.yml/master?enable=pin","Warn: containerImage not pinned by hash: Dockerfile:1: pin your Docker image by updating python:3.6 to python:3.6@sha256:f8652afaf88c25f0d22354d547d892591067aa4026a7fa9a6819df9f300af6fc","Warn: pipCommand not pinned by hash: Dockerfile:3-6","Warn: pipCommand not pinned by hash: .github/workflows/tests.yml:23","Warn: pipCommand not pinned by hash: .github/workflows/tests.yml:24","Warn: pipCommand not pinned by hash: .github/workflows/tests.yml:29","Info:   0 out of   3 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 containerImage dependencies pinned","Info:   0 out of   4 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: BSD 2-Clause \"Simplified\" License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":9,"reason":"1 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: PYSEC-2020-73"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 21 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T16:11:19.867Z","repository_id":51785650,"created_at":"2025-08-17T16:11:19.867Z","updated_at":"2025-08-17T16:11:19.867Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31456664,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["functional-genomics","machine-learning-algorithm"],"created_at":"2025-12-14T19:04:43.018Z","updated_at":"2026-04-06T02:02:16.624Z","avatar_url":"https://github.com/churchmanlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GeneWalk\n\n[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n[![Documentation](https://readthedocs.org/projects/genewalk/badge/?version=latest)](https://genewalk.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/genewalk.svg)](https://badge.fury.io/py/genewalk)\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](https://anaconda.org/bioconda/genewalk)\n[![Python 3.8+](https://img.shields.io/pypi/pyversions/genewalk.svg)](https://www.python.org/downloads)\n\nGeneWalk determines for individual genes the functions that are relevant in a\nparticular biological context and experimental condition. GeneWalk quantifies\nthe similarity between vector representations of a gene and annotated GO terms\nthrough representation learning with random walks on a condition-specific gene\nregulatory network. Similarity significance is determined through comparison\nwith node similarities from randomized networks.\n\n## Install GeneWalk\nTo install the latest release of GeneWalk (preferred):\n```\npip install genewalk\n```\nTo install the latest code from Github (typically ahead of releases):\n```\npip install git+https://github.com/churchmanlab/genewalk.git\n```\n\nGeneWalk uses a number of resource files that it downloads as needed during\nruntime. To optionally pre-download these resource files in the default resource folder,\nthe command\n```\npython -m genewalk.resources\n```\ncan be run.\n\n## Using GeneWalk\n\n### Gene list file\nGeneWalk always requires as input a text file containing a list with genes of\ninterest relevant to the biological context. For example, differentially\nexpressed genes from a sequencing experiment that compares an experimental\nversus control condition. GeneWalk supports gene list files containing HGNC\nhuman gene symbols, HGNC IDs, human Ensembl gene IDs, MGI mouse gene IDs, RGD\nrat gene IDs, or human or mouse entrez IDs. GeneWalk internally maps these IDs \nto human genes. \n\nFor organisms other than human, mouse or rat, there are two options. The first\nis to map the genes to human orthologs yourself and then input the human ortholog \nlist as described above. Use this strategy if you consider the organism \nsufficiently related to human. The second option is to provide an input gene file\nwith custom gene IDs. These are not mapped to human genes. Use custom gene IDs \nfor more divergent organisms, such as drosophila, worm, yeast, plants or bacteria. \nIn this case the user must also provide a custom gene network with GO annotations \nas input. See section Custom input networks for more details.\n\nEach line in the gene input file contains a gene identifier of one of the\nabove types.\n\n### GeneWalk command line interface\nOnce installed, GeneWalk can be run from the command line as `genewalk`, with\na set of required and optional arguments. The required arguments include the\nproject name, a path to a text file containing a list of genes, and an argument\nspecifying the type of gene identifiers in the file.\n\nExample\n```bash\ngenewalk --project context1 --genes gene_list.txt --id_type hgnc_symbol\n```\n\nBelow is the full documentation of the command line interface:\n\n```\ngenewalk [-h] [--version] --project PROJECT --genes GENES --id_type\n              {hgnc_symbol,hgnc_id,ensembl_id,mgi_id,rgd_id,entrez_human,entrez_mouse,custom}\n              [--stage {all,node_vectors,null_distribution,statistics}]\n              [--base_folder BASE_FOLDER]\n              [--network_source {pc,indra,edge_list,sif,sif_annot,sif_full}]\n              [--network_file NETWORK_FILE] [--nproc NPROC] [--nreps NREPS]\n              [--alpha_fdr ALPHA_FDR] [--save_dw SAVE_DW]\n              [--random_seed RANDOM_SEED]\n\n\nrequired arguments:\n  --version             Print the version of GeneWalk and exit.\n  --project PROJECT     A name for the project which determines the folder\n                        within the base folder in which the intermediate and\n                        final results are written. Must contain only\n                        characters that are valid in folder names.\n  --genes GENES         Path to a text file with a list of differentially\n                        expressed genes. Thetype of gene identifiers used in\n                        the text file are provided in the id_type argument.\n  --id_type {hgnc_symbol,hgnc_id,ensembl_id,mgi_id,rgd_id,entrez_human,entrez_mouse,custom}\n                        The type of gene IDs provided in the text file in the\n                        genes argument. Possible values are: hgnc_symbol,\n                        hgnc_id, ensembl_id, mgi_id, rgd_id, entrez_human,\n                        entrez_mouse, and custom. If custom, a network_source\n                        of sif_annot or sif_full must be used.\n\noptional arguments:\n  --stage {all,node_vectors,null_distribution,statistics,visual}\n                        The stage of processing to run. Default: all\n  --base_folder BASE_FOLDER\n                        The base folder used to store GeneWalk temporary and\n                        result files for a given project. Default:\n                        ~/genewalk\n  --network_source {pc,indra,edge_list,sif,sif_annot,sif_full}\n                        The source of the network to be used.Possible values\n                        are: pc, indra, edge_list, sif, sif_annot, and\n                        sif_full. In case of indra, edge_list, sif, sif_annot,\n                        and sif_full, the network_file argument must be\n                        specified. Default: pc\n  --network_file NETWORK_FILE\n                        If network_source is indra, this argument points to a\n                        Python pickle file in which a list of INDRA Statements\n                        constituting the network is contained. In case\n                        network_source is edge_list, sif, sif_annot, or\n                        sif_full, the network_file argument points to a text\n                        file representing the network. See README section\n                        Custom input networks for full description of file\n                        format requirements.\n  --nproc NPROC         The number of processors to use in a multiprocessing\n                        environment. Default: 1\n  --nreps_graph NREPS_GRAPH\n                        The number of repeats to run when calculating node\n                        vectors on the GeneWalk graph. Default: 3\n  --nreps_null NREPS_NULL\n                        The number of repeats to run when calculating node\n                        vectors on the random network graphs for constructing\n                        the null distribution. Default: 3\n  --alpha_fdr ALPHA_FDR\n                        The false discovery rate to use when outputting the\n                        final statistics table. If 1 (default), all\n                        similarities are output, otherwise only the ones whose\n                        false discovery rate are below this parameter are\n                        included. Default: 1 \n                        For visualization a default value of 0.1 for both global\n                        and gene-specific plots is used. Lower this value to \n                        increase the stringency of the regulator gene selection \n                        procedure.\n  --dim_rep DIM_REP     Dimension of vector representations (embeddings). This \n                        value should only be increased if genewalk with the \n                        default value generates no statistically significant \n                        results, for instance with very large (\u003e2500) input \n                        gene lists. Alternatively, it can be decreased in case \n                        (nearly) all GO annotations are significant, for \n                        instance with very short gene lists. Default: 8\n  --save_dw SAVE_DW     If True, the full DeepWalk object for each repeat is\n                        saved in the project folder. This can be useful for\n                        debugging but the files are typically very large.\n                        Default: False\n  --random_seed RANDOM_SEED\n                        If provided, the random number generator is seeded\n                        with the given value. This should only be used if the\n                        goal is to deterministically reproduce a prior result\n                        obtained with the same random seed.\n\n```\n\n### Output files\nGeneWalk automatically creates a `genewalk` folder in the user's home folder\n(or the user specified base_folder).\nWhen running GeneWalk, one of the required inputs is a project name.\nA sub-folder is created for the given project name where all intermediate and\nfinal results are stored. The files stored in the project folder are:\n- **`genewalk_results.csv`** - The main results table, a comma-separated values text file. See below for detailed description.\n- `genes.pkl` - A processed representation of the given gene list, in Python pickle (.pkl) binary file format.\n- `multi_graph.pkl` - A networkx MultiGraph resembling the GeneWalk network which was assembled based on the\ngiven list of genes, an interaction network, GO annotations, and the GO ontology.\n- `deepwalk_node_vectors_*.pkl` - A set of learned node vectors for each analysis repeat for the graph.\n- `deepwalk_node_vectors_rand_*.pkl` - A set of learned node vectors for each analysis repeat for a random graph.\n- `genewalk_rand_simdists.pkl` - Distributions constructed from repeats.\n- `deepwalk_*.pkl` - A DeepWalk object for each analysis repeat on the graph\n(only present if save_dw argument is set to True).\n- `deepwalk_rand_*.pkl` - A DeepWalk object for each analysis repeat on a random graph\n(only present if save_dw argument is set to True).  \n\n### Figure files\nGeneWalk also automatically generates figures to visualize its results in the\nproject/figures sub-folder:\n- **`index.html`**: an HTML page that includes all the figures generated, as\n  described below.\n- barplots with GO annotations ranked by relevance for each input gene that\n  GeneWalk was able to generate results for. The filenames contain the\n  corresponding human gene symbol and input gene id: `barplot_[symbol]_[gene\n  id]_x_mlog10global_padj_y_GO.png`.\n- `regulators_x_gene_con_y_frac_rel_go(.png and .pdf)`: scatter plot to\n  identify regulator genes of interest. These have a large gene connectivity\n  and high fraction of relevant GO annotations. For more information see our\n  publication.\n- `genewalk_regulators.csv`: list with regulator genes that are named in the\n  regulators scatterplot.\n- `moonlighters_x_go_con_y_frac_rel_go(.png and .pdf)`: scatter plot to\n  identify moonlighting genes: genes with many GO annotations of which a low\n  fraction are relevant. For more information see our publication.\n- `genewalk_moonlighters.csv`: list with moonlighting genes that are named in\n  the moonlighting scatterplot.\n- `genewalk_scatterplots.csv`: data corresponding to the regulator and\n  moonlighter scatter plots.  This file can be used for further gene\n  prioritization analyses.\n\n\n### GeneWalk results file description\n`genewalk_results.csv` is the main GeneWalk output table, a comma-separated values text file\nwith the following column headers:\n- hgnc_id - human gene HGNC identifier.\n- **hgnc_symbol** - human gene symbol.\n- **go_name** - GO term name.\n- go_id - GO term identifier.\n- go_domain - Ontology domain that GO term belongs to\n(biological process, cellular component or molecular function).\n- ncon_gene - number of connections to gene in GeneWalk network.\n- ncon_go - number of connections to GO term in GeneWalk network.\n- **global_padj** - false discovery rate (FDR) adjusted p-value of the \nsimilarity between gene and GO term, when correcting for testing over all \ngene-GO term pairs present in the output file.\nThis is the key statistic that indicates how relevant the gene-GO term pair \n(gene function) is in the particular biological context or tested condition. \nGlobal_padj should be used for global analyses that\nconsider all the GeneWalk output simultaneously, such as gene prioritization\nprocedures. GeneWalk determines an adjusted p-value with Benjamini Hochberg FDR \ncorrection for multiple testing of all connected GO term for each \nnreps_graph repeat analysis. The value presented here is the average (mean \nestimate) over all p-adjust values from all nreps_graph repeat analyses. \n- **gene_padj** - FDR adjusted p-value of the similarity between gene and \nGO term, when correcting for multiple testing over all GO annotations of \nthat gene. This the key statistic when investigating the functions of one \n(or a few) pre-defined gene(s) of interest. Gene_padj determines the statistical \nsignificance of each GO annotation (function) and gene_padj can be used to \nsensitively rank GO annotations to reflect the relevance to the gene of interest\nin the particular biological context or tested condition. When you consider all\n(or many) input genes simultaneously, use global_padj instead. Average \nover nreps_graph repeat runs as for global_padj. \n- pval - p-value of gene - GO term similarity, not corrected for multiple\nhypothesis testing. Average over nreps_graph repeat runs.\n- sim - gene - GO term (cosine) similarity, average over nreps_graph repeat runs.\n- sem_sim - standard error on sim (mean estimate).\n- cilow_global_padj - lower bound of 95% confidence interval on global_padj \n(mean estimate) from the nreps_graph repeat analyses.\n- ciupp_global_padj - upper bound of 95% confidence interval on global_padj.\n- cilow_gene_padj - lower bound of 95% confidence interval on gene_padj\n(mean estimate) from the nreps_graph repeat analyses.\n- ciupp_gene_padj - upper bound of 95% confidence interval on gene_padj.\n- cilow_pval - lower bound of 95% confidence interval on pval (mean estimate)\nfrom the nreps_graph repeat analyses.\n- ciupp_pval - upper bound of 95% confidence interval on pval.\n- mgi_id, rgd_id, ensembl_id, entrez_human or entrez_mouse - in case one of\n  these gene identifiers were provided as input, the GeneWalk results table\n  starts with an additional column to indicate the gene identifiers. In the\n  case of mouse genes, the corresponding hgnc_id and hgnc_symbol resemble its\n  human ortholog gene used for the GeneWalk analysis.\n\n\n### Run time and stages of GeneWalk algorithm\nRecommended number of processors (optional argument: nproc) for a short (1-2h)\nrun time is 4:\n```bash\ngenewalk --project context1 --genes gene_list.txt --id_type hgnc_symbol --nproc 4\n```\nBy default GeneWalk will run with 1 processor, resulting in a longer overall\nrun time: 6-12h.\nGiven a list of genes, GeneWalk runs three stages of analysis:\n1. Assembling a GeneWalk network and learning node vector representations\nby running DeepWalk on this network, for a specified number of repeats.\nTypical run time: one to a few hours.\n2. Learning random node vector representations by running DeepWalk on a set of\nrandomized versions of the GeneWalk network, for a specified number of\nrepeats. Typical run time: one to a few hours.\n3. Calculating statistics of similarities between genes and GO terms, and\noutputting  the GeneWalk results in a table. Typical run time: a few minutes.\n4. Visualization of the GeneWalk results generated in the project/figures subfolder.\nTypical run time: 1-10 mins depending on the number of input genes.\n\nGeneWalk can either be run once to complete all these stages (default), or\ncalled separately for each stage (optional argument: stage).  Recommended\nmemory availability on your operating system: 16Gb or 32Gb RAM.  GeneWalk\noutputs the uncertainty (95% confidence intervals) of the similarity\nsignificance (global and gene p-adjust). Depending on the context-specific network\ntopology, this uncertainty can be large for individual gene - function\nassociations. However, if overall the uncertainties turn out very large, one\ncan set the optional arguments nreps_graph to 10 (or more) and nreps_null to 10\nto increase the algorithm's precision. This comes at the cost of an increased\nrun time.\n\n\n### Custom input networks\nBy default, GeneWalk uses the PathwayCommons resource (`--network_source pc`)\nto create a human gene network. It then automatically adds edges\nrepresenting GO annotations for input genes and ontology relations between\nGO terms. However, there are options to run GeneWalk with a custom network as\nan input. \n\nFirst, specify the `--network_source` argument as one of the alternative sources:\n`{indra, edge_list, sif, sif_annot, sif_full}`. \n\nIf custom gene IDs are used (`--id_type custom`) in the input gene list, for\ninstance from a model organism: choose as network source `sif_annot` or `sif_full`.\n\nThen, include the argument `--network_file` with the path to the custom network \ninput file. The network file format has to correspond to the chosen\n`--network_source`, as follows. \n\nThe `sif/sif_annot/sif_full` options require the network file in a simple \ninteraction file (SIF) format. Each row of the SIF text file consists of \nthree comma-separated entries representing source, relation type, and target.\nThe relation type is not explicitly used by GeneWalk, and can be set\nto an arbitrary label.\n\nThe difference between the `sif`, `sif_annot`, and `sif_full` options:\n- `sif`: the input SIF can contain only *human* gene-gene relations. \n   Genes have to be encoded as human HGNC gene symbols (for example KRAS).\n   GO annotations for genes, as well as ontology relations \n   between GO terms are added automatically by GeneWalk. \n- `sif_annot`: the input SIF has to contain both\n  gene-gene relations, and GO annotations for genes: rows where the\n  source is a gene, and the target is a GO term. Use GO IDs with prefix \n  (for example GO:0000186) to encode GO terms. Genes should be encoded the same\n  as in the gene input list and do not have to correspond to human genes. \n  Ontology relations between GO terms are then added automatically by GeneWalk.\n- `sif_full`: the input SIF has to contain all GeneWalk network edges: \n  gene-gene relations, GO annotations for genes, and ontology relations between\n  GO terms. GeneWalk does not add any more edges to the network. Encode genes and\n  GO terms in the same manner as for `sif_annot`.\n\nThe `edge_list` option is a simplified version of the `sif` option. It requires \na network text file that contains rows with two columns each, a source and a target. \nIn other words, it omits the relation type column from the SIF format. Further file \npreparation requirements are the same as for the `sif` option.\n\nThe `indra` option requires as custom network input file a Python pickle file \ncontaining a list of INDRA Statements. These statements can represent human gene-gene, \nas well as gene-GO relations from which network edges are derived. Human GO \nannotations and ontology relations between GO terms are then added automatically \nby GeneWalk during network construction.\n\n\n### Further documentation\nFor a tutorial and more general information see the\n[GeneWalk website](http://churchman.med.harvard.edu/genewalk).  \nFor further code documentation see our [readthedocs page](https://genewalk.readthedocs.io).\n\n\n### Citation\nRobert Ietswaart, Benjamin M. Gyori, John A. Bachman, Peter K. Sorger, and\nL. Stirling Churchman  \n*GeneWalk identifies relevant gene functions for a biological context using network\nrepresentation learning*,  \nGenome Biology **22**, 55 (2021). [https://doi.org/10.1186/s13059-021-02264-8](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02264-8)  \n\n\n### Funding\nThis work was supported by National Institutes of Health grant 5R01HG007173-07\n(L.S.C.), EMBO fellowship ALTF 2016-422 (R.I.), and DARPA grants W911NF-15-1-0544\nand W911NF018-1-0124 (P.K.S.).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchurchmanlab%2Fgenewalk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchurchmanlab%2Fgenewalk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchurchmanlab%2Fgenewalk/lists"}