{"id":15118846,"url":"https://github.com/genentech/iterative-perturb-seq","last_synced_at":"2025-04-14T00:32:43.266Z","repository":{"id":198326631,"uuid":"699145349","full_name":"Genentech/iterative-perturb-seq","owner":"Genentech","description":"Sequential Optimal Experimental Design of Perturbation Screens Guided by Multimodal Priors","archived":false,"fork":false,"pushed_at":"2024-05-25T02:27:44.000Z","size":3731,"stargazers_count":32,"open_issues_count":1,"forks_count":2,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-09-26T02:01:36.164Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Genentech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-02T02:42:21.000Z","updated_at":"2024-09-18T18:53:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"71f8f1f4-cb69-432f-b1c4-ef7c99214bfb","html_url":"https://github.com/Genentech/iterative-perturb-seq","commit_stats":null,"previous_names":["genentech/iterative-perturb-seq"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fiterative-perturb-seq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fiterative-perturb-seq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fiterative-perturb-seq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Genentech%2Fiterative-perturb-seq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Genentech","download_url":"https://codeload.github.com/Genentech/iterative-perturb-seq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229117070,"owners_count":18022819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-26T01:53:39.152Z","updated_at":"2024-12-10T19:19:20.383Z","avatar_url":"https://github.com/Genentech.png","language":"Jupyter Notebook","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"readme":"# Sequential Optimal Experimental Design of Perturbation Screens\n\nThis repository hosts the code base for the paper\n\n**Sequential Optimal Experimental Design of Perturbation Screens Guided by Multimodal Priors**\\\nKexin Huang, Romain Lopez, Jan-Christian Hütter, Takamasa Kudo, Antonio Rios, Aviv Regev\n\n[[Paper](https://www.biorxiv.org/content/10.1101/2023.12.12.571389v1)]\n\nCorresponding: lopez.romain@gene.com; regev.aviv@gene.com\n\n## Overview\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/Genentech/iterative-perturb-seq/blob/master/img/illustration.png\" alt=\"logo\" width=\"800px\" /\u003e\u003c/p\u003e\n\nUnderstanding a cell's transcriptional response to genetic perturbations answers vital biological questions such as cell reprogramming and target discovery. Despite significant advances in the Perturb-seq technology, the demand for vast experimental configurations surpasses the capacity for existing assays. Recent machine learning models, trained on existing Perturb-seq data sets, predict perturbation outcomes but face hurdles due to sub-optimal training set selection, resulting in weak predictions for unexplored perturbation space. In this study, we propose a sequential approach to the design of Perturb-seq experiments that uses the model to strategically select the most informative perturbations at each step, for follow-up experiments. This enables a significantly more efficient exploration of the perturbation space, while predicting the effect of the rest of the perturbations with high-fidelity. We conduct a preliminary data analysis on a large-scale Perturb-seq experiment, which reveals that our setting is severely restricted by the number of examples and rounds, falling into a non-conventional active learning regime called ''active learning under budget''. Motivated by this insight, we develop IterPert that exploits rich and multi-modal prior knowledge in order to efficiently guide the selection of perturbations. Making use of prior knowledge for this task is novel, and crucial for our setting of active learning under budget. We validate our method using in-silico benchmarking of active learning, constructed from a large-scale CRISPRi Perturb-seq data set. Our benchmarking reveals that IterPert outperforms contemporary active learning strategies, and delivering comparable accuracy with only a third of the amount of perturbations profiled. All in all, these results demonstrate the potential of sequentially designing perturbation screens.\n\n\n\n## Installation\n\nUse the API:\n\n```bash\nconda create --name iterpert_env python=3.8\nconda activate iterpert_env\nconda install pyg -c pyg\npip install iterpert\n```\n\nUse the raw source code:\n\n```bash\nconda create --name iterpert_env python=3.8\nconda activate iterpert_env\nconda install pyg -c pyg\ngit clone https://github.com/Genentech/iterative-perturb-seq.git\ncd iterative-perturb-seq\npip install -r requirements.txt\n```\n\n## API interface\n\nFirst, initialize the `IterPert` module:\n\n```python\nfrom iterpert.iterpert import IterPert\ninterface = IterPert(weight_bias_track = True, \n                     exp_name = strategy,\n                     device = 'cuda:0', \n                     seed = 1)\n```\n\nThe arguments are:\n\n- `weight_bias_track`: True/False, if use weights and bias tracking\n- `device`: cuda device\n- `proj_name`: weights and bias project name\n- `exp_name`: weights and bias experiment name\n- `seed`: random seed in data split\n- `run`: random seed in training run\n\nThen, initialize the data:\n\n```python\npath = 'YOUR PATH'\ninterface.initialize_data(path = path,\n                          dataset_name='replogle_k562_essential_1000hvg',\n                          batch_size = 256)\n```\n\nThe arguments are:\n\n- `path`: path to save the data\n- `dataset_name`: name of the dataset\n- `batch_size`: number of cells in a batch\n- `test_fraction`: fraction of the hold out test set\n\nThen, initialize the GEARS model:\n\n```python\ninterface.initialize_model(epochs = 20, hidden_size = 64)\n```\n\nThe arguments are:\n\n- `epochs`: the number of training epochs\n- `hidden_size`: the number of hidden size of the model\n- `retrain`: True/False, whether to retrain the model in each round\n\n\nThen, initialize the active learning strategy:\n\nYou can also choose from baselines `Random`, `BALD`, `BatchBALD`, `BAIT`, `ACS-FW`, `Core-Set`, `BADGE`, `LCMD` or specify our method `IterPert`\n\n```python\ninterface.initialize_active_learning_strategy(strategy = 'IterPert')\n```\n\nLastly, kick off the training:\n\n```python\ninterface.start(n_init_labeled = 100, n_round = 5, n_query = 100)\n```\n\nThe arguments are:\n\n- `n_init_labeled`: the number of initialized number of samples\n- `n_round`: the number of rounds\n- `n_query`: the number of queries per round\n\n\n\n## Demo\n\nWe provide  tutorials to get started with iterative perturb-seq:\n\n| Name  | Description                                             |\n|-------|---------------------------------------------------------|\n| [Data Tutorial](demo/data_tutorial.ipynb)   | Introduce the data loader and how to use your own data |\n| [Training Tutorial](demo/train_tutorial.ipynb)   | A demo on training iterpert |\n| [Knowledge Kernel Tutorial](demo/knowledge_kernels_process.ipynb)   | A tutorial on creating knowledge kernel for your own data |\n\n\n## Reproduce experiments\n\nPlease refer to `reproduce_repo` directory to reproduce each experiment. Notably, the `README.md` contains sh files to generate all experiments. `figX.ipynb` is the notebook that produces the figures. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenentech%2Fiterative-perturb-seq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgenentech%2Fiterative-perturb-seq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgenentech%2Fiterative-perturb-seq/lists"}