{"id":31799477,"url":"https://github.com/mims-harvard/pinnacle","last_synced_at":"2025-10-10T22:02:20.313Z","repository":{"id":180238918,"uuid":"664823212","full_name":"mims-harvard/PINNACLE","owner":"mims-harvard","description":"Contextual AI models for single-cell protein biology","archived":false,"fork":false,"pushed_at":"2025-02-14T16:51:44.000Z","size":790,"stargazers_count":79,"open_issues_count":1,"forks_count":20,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-14T17:37:55.083Z","etag":null,"topics":["context-aware","contextual","geometric-deep-learning","graph-neural-networks","proteins","therapeutics"],"latest_commit_sha":null,"homepage":"https://zitniklab.hms.harvard.edu/projects/PINNACLE","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mims-harvard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-10T20:38:50.000Z","updated_at":"2025-02-14T16:51:48.000Z","dependencies_parsed_at":"2023-12-30T04:28:31.301Z","dependency_job_id":"50948bac-9833-47aa-996d-37fa5f9e13d3","html_url":"https://github.com/mims-harvard/PINNACLE","commit_stats":null,"previous_names":["mims-harvard/pinnacle"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mims-harvard/PINNACLE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mims-harvard%2FPINNACLE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mims-harvard%2FPINNACLE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mims-harvard%2FPINNACLE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mims-harvard%2FPINNACLE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mims-harvard","download_url":"https://codeload.github.com/mims-harvard/PINNACLE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mims-harvard%2FPINNACLE/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279005430,"owners_count":26083891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["context-aware","contextual","geometric-deep-learning","graph-neural-networks","proteins","therapeutics"],"created_at":"2025-10-10T22:01:43.227Z","updated_at":"2025-10-10T22:02:20.302Z","avatar_url":"https://github.com/mims-harvard.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PINNACLE: Contextual AI models for single-cell protein biology\n\n**Authors**:\n- [Michelle M. Li](http://michellemli.com)\n- [Yepeng Huang](http://zitniklab.hms.harvard.edu)\n- [Marissa Sumathipala](http://zitniklab.hms.harvard.edu)\n- [Man Qing Liang](http://zitniklab.hms.harvard.edu)\n- [Alberto Valdeolivas]()\n- [Ashwin Ananthakrishnan]()\n- [Katherine Liao]()\n- [Daniel Marbach]()\n- [Marinka Zitnik](http://zitniklab.hms.harvard.edu)\n\n## Overview of PINNACLE\n\nProtein interaction networks are a critical component in studying the function and therapeutic potential of proteins. However, accurately modeling protein interactions across diverse biological contexts, such as tissues and cell types, remains a significant challenge for existing algorithms.\n\nWe introduce PINNACLE, a flexible geometric deep learning approach that trains on contextualized protein interaction networks to generate context-aware protein representations. Leveraging a multi-organ single-cell transcriptomic atlas of humans, PINNACLE provides 394,760 protein representations split across 156 cell-type contexts from 24 tissues and organs. We demonstrate that PINNACLE's contextualized representations of proteins reflect cellular and tissue organization and PINNACLE's tissue representations enable zero-shot retrieval of tissue hierarchy. Infused with cellular and tissue organization, our contextualized protein representations can easily be adapted for diverse downstream tasks.\n\nWe fine-tune PINNACLE to study the genomic effects of drugs in multiple cellular contexts and show that our context-aware model significantly outperforms state-of-the-art, yet context-agnostic, models. Enabled by our context-aware modeling of proteins, PINNACLE is able to nominate promising protein targets and cell-type contexts for further investigation. PINNACLE exemplifies and empowers the long-standing paradigm of incorporating context-specific effects for studying biological systems, especially the impact of disease and therapeutics.\n\n### The PINNACLE Algorithm\n\nPINNACLE is a self-supervised geometric deep learning model that can generate protein representations in diverse celltype contexts. It is trained on a set of context-aware protein interaction networks unified by a cellular and tissue network to produce contextualized protein representations based cell type activation. Unlike existing approaches, which do not consider biological context, PINNACLE produces multiple representations of proteins based on their cell type context, representations of the cell type contexts themselves, and representations of the tissue hierarchy. \n\nGiven the multi-scale nature of the model inputs, PINNACLE is equipped to learn the topology of proteins, cell types, and tissues in a single unified embedding space. PINNACLE uses protein-, cell type-, and tissue-level attention mechanisms and objective functions to inject cellular and tissue organization into the embedding space. Intuitively, pairs of nodes that share an edge should be embedded nearby, proteins of the same cell type context should be embedded nearby (and far from proteins in other cell type contexts), and proteins should be embedded close to their cell type context (and far from other cell type contexts).\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"img/pinnacle_overview.png?raw=true\" width=\"700\" \u003e\n\u003c/p\u003e\n\n\n## Installation and Setup\n\n### :one: Download the Repo\n\nFirst, clone the GitHub repository:\n\n```\ngit clone https://github.com/mims-harvard/PINNACLE\ncd PINNACLE\n```\n\n### :two: Set Up Environment\n\nThis codebase leverages Python, Pytorch, Pytorch Geometric, etc. To create an environment with all of the required packages, please ensure that [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) is installed and then execute the commands:\n\n```\nconda env create -f environment.yml\nconda activate pinnacle\nbash install_pyg.sh\n```\n\n### :three: Download Datasets\n\nThe data is hosted on [Figshare](https://figshare.com/articles/software/PINNACLE/22708126). To maintain the directory structure while downloading the files, make sure to select all files and download in the original format. Make sure to also unzip all files in the download.\n\nWe provide the following datasets for training PINNACLE:\n- Global reference protein interaction network\n- Cell type specific protein interaction networks\n- Metagraph of cell type and tissue relationships\n\nThe networks are provided in the appropriate format for PINNACLE. If you would like to use your own set of contextualized networks, please adhere to the format used in the cell type specific protein interaction networks (see [README](https://github.com/mims-harvard/PINNACLE/blob/main/data_prep/README.md) in `data_prep` folder for more details). The file should be structured as a tab-delimited table, where each line contains information for a single context. Each line must contain the following elements (in this order): index, context name (e.g., cell type name), comma-delimited list of nodes. The lists of nodes are used to extract a subgraph from the global reference network (e.g., global reference protein interaction network).\n\n### :four: (Optional) Download Model Checkpoints\nWe also provide checkpoints for PINNACLE after pretraining. The checkpoints for PINNACLE can be found [here](https://figshare.com/articles/software/PINNACLE/22708126). Make sure all downloaded files are unzipped. You can use these checkpoints (and/or embeddings) directly with the scripts in the `finetune_pinnacle` folder instead of training the models yourself.\n\n## Usage\n\n### Finetune PINNACLE on Your Own Datasets\n\nYou can finetune PINNACLE on your own datasets by using our provided model checkpoints or contextualized representations (i.e., no re-training needed). Please review this [README](https://github.com/mims-harvard/PINNACLE/blob/main/finetune_pinnacle/README.md) to learn how to preprocess and finetune PINNACLE on your own datasets.\n\n### Train PINNACLE\n\nYou can reproduce our results or pretrain PINNACLE on your own networks:\n```\ncd pinnacle\npython train.py \\\n        --G_f ../data/networks/global_ppi_edgelist.txt \\\n        --ppi_dir ../data/networks/ppi_edgelists/ \\\n        --mg_f ../data/networks/mg_edgelist.txt \\\n        --save_prefix ../data/pinnacle_embeds/\n```\n\nTo see and/or modify the default hyperparameters, please see the `get_hparams()` function in `pinnacle/parse_args.py`.\n\nAn example bash script is provided in `pinnacle/run_pinnacle.sh`.\n\n### Visualize PINNACLE Representations\n\nAfter training PINNACLE, you can visualize PINNACLE's representations using `evaluate/visualize_representations.py`.\n\n### Finetune PINNACLE for nominating therapeutic targets\n\nAfter training PINNACLE (you may also simply use our already-trained models), you can finetune PINNACLE for any downstream biomedical task of interest. Here, we provide instructions for nominating therapeutic targets. An example bash script can be found [here](https://github.com/mims-harvard/PINNACLE/blob/main/finetune_pinnacle/run_model.sh).\n\n:sparkles: To finetune PINNACLE for nominating therapeutic targets of rheumatoid arthritis:\n\n```\ncd finetune_pinnacle\npython train.py \\\n        --disease EFO_0000685 \\\n        --embeddings_dir ./data/pinnacle_embeds/\n```\n\n:sparkles: To finetune PINNACLE for nominating therapeutic targets of inflammatory bowel disease:\n\n```\ncd finetune_pinnacle\npython train.py \\\n        --disease EFO_0003767 \\\n        --embeddings_dir ./data/pinnacle_embeds/\n```\n\nTo generate predictions on a different therapeutic area, simply find the disease ID from OpenTargets and change the `---disease` flag.\n\nTo see and/or modify the default hyperparameters, please see the `get_hparams()` function in `finetune_pinnacle/train_utils.py`.\n\n## Additional Resources\n\n- [Paper](https://www.biorxiv.org/content/10.1101/2023.07.18.549602)\n- [Demo](https://huggingface.co/spaces/michellemli/PINNACLE/)\n- [Project Website](https://zitniklab.hms.harvard.edu/projects/PINNACLE/)\n\n```\n@article{pinnacle,\n  title={Contextual AI models for single-cell protein biology},\n  author={Li, Michelle M and Huang, Yepeng and Sumathipala, Marissa and Liang, Man Qing and Valdeolivas, Alberto and Ananthakrishnan, Ashwin N and Liao, Katherine and Marbach, Daniel and Zitnik, Marinka},\n  journal={Nature Methods},\n  pages={1--12},\n  year={2024},\n  publisher={Nature Publishing Group US New York}\n}\n```\n\n\n## Questions\n\nPlease leave a Github issue or contact Michelle Li at michelleli@g.harvard.edu.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmims-harvard%2Fpinnacle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmims-harvard%2Fpinnacle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmims-harvard%2Fpinnacle/lists"}