{"id":17141368,"url":"https://github.com/agitter/multipcsf","last_synced_at":"2025-03-24T08:22:00.805Z","repository":{"id":86493423,"uuid":"47654267","full_name":"agitter/MultiPCSF","owner":"agitter","description":"An implementation of the Multi-PCSF algorithm described in","archived":false,"fork":false,"pushed_at":"2019-01-10T20:05:55.000Z","size":407,"stargazers_count":2,"open_issues_count":2,"forks_count":4,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-01-29T13:46:25.246Z","etag":null,"topics":["multi-task","network","protein-protein-interaction","steiner-tree"],"latest_commit_sha":null,"homepage":"http://doi.org/10.1142/9789814583220_0005","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agitter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-12-08T22:45:11.000Z","updated_at":"2020-06-10T19:40:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"1cbf6101-9c38-43e2-a664-5146b830c698","html_url":"https://github.com/agitter/MultiPCSF","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agitter%2FMultiPCSF","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agitter%2FMultiPCSF/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agitter%2FMultiPCSF/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agitter%2FMultiPCSF/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agitter","download_url":"https://codeload.github.com/agitter/MultiPCSF/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245233042,"owners_count":20581730,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["multi-task","network","protein-protein-interaction","steiner-tree"],"created_at":"2024-10-14T20:25:09.707Z","updated_at":"2025-03-24T08:22:00.795Z","avatar_url":"https://github.com/agitter.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[Gitter et al 2014]: http://www.worldscientific.com/doi/abs/10.1142/9789814583220_0005\n[Omics Integrator]: https://github.com/fraenkel-lab/OmicsIntegrator\n[msgsteiner]: http://areeweb.polito.it/ricerca/cmp/code/bpsteiner\n[TCGA 2012]: http://www.nature.com/nature/journal/v490/n7418/full/nature11412.html\n[Szklarczyk et al 2011]: http://nar.oxfordjournals.org/content/39/suppl_1/D561.long\n[TCGA]: http://cancergenome.nih.gov/publications/publicationguidelines\n[STRING]: http://string-db.org/cgi/access.pl?footer_active_subpage=licensing\n[Database of Cell Signaling]: http://stke.sciencemag.org/about/help/cm\n[Gough 2002]: https://doi.org/10.1111/j.1749-6632.2002.tb04532.x\n[Microsoft Research]: https://www.microsoft.com/en-us/research/lab/microsoft-research-new-england/\n\n# Multi-PCSF\n[![DOI](https://zenodo.org/badge/47654267.svg)](https://zenodo.org/badge/latestdoi/47654267)\n\nThis repository contains an implementation of the multi-sample prize-collecting Steiner forest (Multi-PCSF) algorithm described in [Gitter et al 2014].\nThis code is provided for reproducibility of the results in the manuscript but is no longer under active development.\nThe [Omics Integrator] website describes how to install the [msgsteiner] dependency required by Multi-PCSF.\n\n[Omics Integrator 2](https://github.com/fraenkel-lab/OmicsIntegrator2/tree/master/multi-PCSF) from the Fraenkel laboratory contains a re-implementation of Multi-PCSF with additional features, such as support for a hierarchical clustering of samples.\n\n## Example\n`BreastCancer.sh` in the scripts subdirectory provides an example of how to run\nMulti-PCSF.  Before running the script, the `msgpath` variable must be set to\nthe location of the msgsteiner executable, including the file name.\n\n## Data\nThe breast cancer tumor sample data and protein-protein interaction network data\ndescribed in the Multi-PCSF manuscript are provided as an example dataset.  If\nyou use these data in a manuscript, cite [TCGA 2012] for the breast cancer data\nand [Szklarczyk et al 2011] for the STRING protein-protein interaction network\nand see their respective websites ([TCGA], [STRING]) for the terms of use.\n\nThe *Science Signaling* Database of Cell Signaling EGFR pathway that was used to\nsimulate samples is also provided in the `data` subdirectory.  If you use this\npathway in a manuscript, cite [Gough 2002] and see the [Database of Cell\nSignaling] website for the terms of use.\n\n## Usage\nOnly the most commonly used options are described below.  Use `python\nConstrainedMultiSample.py -h` to view the complete usage message. See the\nprovided example data for file formatting guidelines.  Please open an issue with\nany usage questions.\n```\nUsage: ConstrainedMultiSample.py [options]\n\nOptions:\n  -h, --help            show this help message and exit\n  --interactomepath=INTERACTOMEPATH\n                        This path points to the directory that contains the\n                        interaction network files\n  --terminalpath=TERMINALPATH\n                        This path points to the directory that contains the\n                        terminal (node prize) files\n  --resultpath=RESULTPATH\n                        This path points to the directory where the output\n                        files will be written.\n  --undirectedfile=UNDIRECTEDFILE\n                        The name of the protein-protein interaction file in\n                        the interactomepath directory.  The file is expected\n                        to contain undirected interactions with probabilistic\n                        weights (e.g in [0,1]). Columns should be ordered\n                        [prot1 prot2 weight].\n  --terminalfile=MASTERTERMINALFILE\n                        A file in the terminalpath directory that lists the\n                        files that give the node prizes for each sample.  All\n                        listed filenames should be relative to terminal path.\n                        If gene penalties are given in the terminal files,\n                        gene names should end with '_MRNA'.  Optionally can\n                        include a tab-separated second column that assigns\n                        each sample to a group so the forests are only\n                        constrained to be similar to other samples in the same\n                        group.\n  --msgpath=MSGPATH     The path and file name of the msgsteiner executable\n  --depth=DEPTH         Depth parameter that limits the maximum depth from the\n                        Steiner tree root to the leaves\n  --W=W                 The cost of the edges from the artificial root node to\n                        its neighbors.\n  --beta=BETA           The scaling factor applied to the node prizes, which\n                        is used to control the relative strength of node\n                        prizes and edge costs.  This scaling is only performed\n                        once when the initial stp files are created.\n  --lambda=LAMBDA1      The tradeoff coefficient for the penalty incurred by\n                        nodes in the Steiner forests that are not in the set\n                        of common nodes.\n  --alpha=LAMBDA2       The tradeoff coefficient for the reward on the size of\n                        the set of common nodes when using unweighted\n                        artificial prizes or the power to which the node\n                        frequency is taken for weighted prizes.\n  --mu=MU               A parameter used to penalize high-degree nodes from\n                        being selected as Steiner nodes.  Does not affect\n                        prize nodes but does affect artificial prizes.  The\n                        penalty is -mu*degree.  Set mu \u003c= 0 to disable the\n                        penalty (default).\n  --iterations=ITERATIONS\n                        The number of iterations to run\n  --workers=WORKERS     The number of worker processes to use in the\n                        multiprocessing pool or threads to use in multi-\n                        threaded belief propagation.  Should not exceed the\n                        number of cores available.  Defaults to the number of\n                        CPUs.\n  --artificialprizes=ARTIFICIALPRIZES\n                        Use 'positive' or 'negative' prizes to encourage trees\n                        to include common set proteins.  Use\n                        'positiveWeighted' or 'negativeWeighted' (default)\n                        prizes to construct weighted artificial prizes based\n                        on the node frequency in the most recent forests.\n  --dummyneighbors=DUMMYNEIGHBORS\n                        Connect the dummy node to all 'prizes' (default) or\n                        'nonprizes'.\n  --itermode=ITERMODE   Learn forests simultaneously in 'batch' (default) or\n                        sequentially in 'random' order.  Batch mode computes\n                        artificial prizes with respect to all forests at the\n                        previous iteration.  Random mode computes prizes for a\n                        specific sample given the most recent forests for all\n                        other samples.\n```\n\n## Output\nSeveral subdirectories are created in the directory specified by the\n`--resultpath` argument.  The `initial` and `itr*` directories (one for each of\nthe iterations specified by the `--iterations` argument) provide detailed\ninformation about intermediate results.  Except for the last `itr*` directory,\nthese can typically be deleted after Multi-PCSF terminates.\n\nThe location of the final Multi-PCSF networks depends on the settings. If\n`--artificialprizes` was set to one of the negative prize options or only one\niteration was run, the output networks are in the last `itr*` directory.  If\npositive artificial prizes were used, a post-processing pruning step is\nexecuted.  This runs the Steiner forest algorithm once more for each sample to\nprune nodes in the network that do not connect real prize nodes to the forest\nbut rather were included only due to their positive artificial prizes.  In this\ncase, the output networks are in the `final` directory.\n\nThe output directory contains intermediate files and the following files that\nare most useful for interpreting and visualizing the networks.  For each input\nfile `\u003csample\u003e` listed in the `--terminalfile` input file, several output files\nwill be created:\n* `symbol_\u003csample\u003e_\u003coptions\u003e.txt`: `\u003csample\u003e` is the input sample name and\n`\u003coptions\u003e` are the values of the `W`, `beta`, and `depth` arguments. This\nspace-separated file contains a line for each edge in the output network, where\neach line provides the names of the interacting proteins. The artificial root\nnode `DUMMY` is still present. This is typically the most relevant\nrepresentation of the output network. The edges are the same as the edges in the\nmsgsteiner output file `\u003csample\u003e_\u003coptions\u003e.txt`.\n* `symbol_fullnetwork_\u003csample\u003e_\u003coptions\u003e.txt`: `\u003csample\u003e` is the input sample name\nand `\u003coptions\u003e` are the values of the `W`, `beta`, and `depth` arguments. This\ntab-separated file contains a line for each edge in the output network. The\nartificial root node has been removed.  The `steiner` edges are the edges from\nthe optimal Steiner forest.  The `intra` edges are additional edges that have\nbeen added back to the Steiner forest, which are sometimes useful for\nidentifying alternative pathway connections.\n* `\u003csample\u003e_\u003coptions\u003e.output`: Summary statistics of the Steiner forest produced.\n* `\u003csample\u003e_\u003coptions\u003e.objective`: Output messages from the msgsteiner program,\nincluding optimization progress.\n\nThe other files are intermediate files used to create the input for msgsteiner\nor prepare the output network file from the msgsteiner output.\n\n## Simulation\nThe `simulation` subdirectory contains the code that was used to simulate\ninput samples from synthetic or real pathways.  This code currently serves as\nextended documentation and is not runnable.  It uses an old version of\n`ConstrainedMultiSample.py` and needs to be updated to use the refactored\nversion, which accepts different command line arguments.\n\n## Roadmap\n* Implement multi-sample functionality in Omics Integrator\n* Refactor simulation code to generate prizes from known or synthetic pathways\n* Document support for distinct groups of samples\n\n## Developers\n* Nurcan Tuncbag\n* Anthony Gitter\n\n## Acknowledgements\nPortions of the Multi-PCSF software were developed with support from [Microsoft\nResearch] while Anthony Gitter was a postdoctoral researcher there.  We thank\nMicrosoft for granting permission to release the code as open source under the\nSample Code Exception and Paul Oka in particular for coordinating the release.\nWe acknowledge all authors of [Gitter et al 2014] for their role in the\nalgorithm development.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagitter%2Fmultipcsf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagitter%2Fmultipcsf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagitter%2Fmultipcsf/lists"}