{"id":22582460,"url":"https://github.com/paccmann/paccmann_omics","last_synced_at":"2025-04-10T19:12:04.295Z","repository":{"id":47000915,"uuid":"219130214","full_name":"PaccMann/paccmann_omics","owner":"PaccMann","description":"Generative models for transcriptomics profiles and proteins","archived":false,"fork":false,"pushed_at":"2021-09-17T23:24:24.000Z","size":56,"stargazers_count":8,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-24T16:53:29.070Z","etag":null,"topics":["deep-learning","generative-model","proteomics","transcriptomics","vae","variational-autoencoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PaccMann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-02T09:15:06.000Z","updated_at":"2024-10-08T23:39:06.000Z","dependencies_parsed_at":"2022-07-26T13:30:07.896Z","dependency_job_id":null,"html_url":"https://github.com/PaccMann/paccmann_omics","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_omics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_omics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_omics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_omics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PaccMann","download_url":"https://codeload.github.com/PaccMann/paccmann_omics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248279803,"owners_count":21077408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","generative-model","proteomics","transcriptomics","vae","variational-autoencoder"],"created_at":"2024-12-08T06:10:18.767Z","updated_at":"2025-04-10T19:12:04.274Z","avatar_url":"https://github.com/PaccMann.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://github.com/PaccMann/paccmann_omics/actions/workflows/build.yml/badge.svg)](https://github.com/PaccMann/paccmann_omics/actions/workflows/build.yml)\n# paccmann_omics\n\nGenerative models of omic data for PaccMann\u003csup\u003eRL\u003c/sup\u003e.\n\n`paccmann_omics` is a package to model omic data, with examples for generative \nmodels of gene expression profiles and encoded proteins (vector representations).\nFor example, see our papers:\n- [_PaccMann\u003csup\u003eRL\u003c/sup\u003e: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning_](https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6) (_iScience_, 2021). In there, we use a denoising, dense VAE to model gene expression profiles from TCGA (code in this repo). We then use these encodings to conditionally generate de novo molecules with high predicted efficacy against these cell types.\n- [Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2](https://iopscience.iop.org/article/10.1088/2632-2153/abe808) (_Machine Learning: Science and Technology_, 2021). In there, we use a denoising, dense VAE to model proteins from UniProt (code in this repo). We then use a set of 41 SARS-CoV-2 related proteins to conditionally generate de novo molecules with high predicted binding affinity against these proteins.\n\n## Requirements\n\n- `conda\u003e=3.7`\n\n## Installation\n\nThe library itself has few dependencies (see [setup.py](setup.py)) with loose requirements. \nTo run the example training script we provide environment files under `examples/`.\n\nCreate a conda environment:\n\n```sh\nconda env create -f examples/gene_expression/conda.yml\n```\n\nActivate the environment:\n\n```sh\nconda activate paccmann_omics\n```\n\nInstall in editable mode for development:\n\n```sh\npip install -e .\n```\n\n## Example usage\n\nIn the `examples` directory is a training script `train_vae.py` that makes use\nof paccmann_omics.\n\n```console\n(paccmann_omics) $ python examples/gene_expression/train_vae.py -h\nusage: train_vae.py [-h]\n                    train_filepath val_filepath gene_filepath model_path\n                    params_filepath training_name\n\nOmics VAE training script.\n\npositional arguments:\n  train_filepath   Path to the training data (.csv).\n  val_filepath     Path to the validation data (.csv).\n  gene_filepath    Path to a pickle object containing list of genes.\n  model_path       Directory where the model will be stored.\n  params_filepath  Path to the parameter file.\n  training_name    Name for the training.\n\noptional arguments:\n  -h, --help       show this help message and exit\n```\n\n`params_filepath` could point to [examples/gene_expression/example_params.json](examples/gene_expression/example_params.json), examples for other files can be downloaded from [here](https://ibm.box.com/v/paccmann-pytoda-data).\n\n## References\n\nIf you use `paccmann_omics` in your projects, please cite the following:\n\n```bib\n@article{born2021datadriven,\n  author = {Born, Jannis and Manica, Matteo and Cadow, Joris and Markert, Greta and Mill, Nil Adell and Filipavicius, Modestas and Janakarajan, Nikita and Cardinale, Antonio and Laino, Teodoro and {Rodr{\\'{i}}guez Mart{\\'{i}}nez}, Mar{\\'{i}}a},\n  doi = {10.1088/2632-2153/abe808},\n  issn = {2632-2153},\n  journal = {Machine Learning: Science and Technology},\n  number = {2},\n  pages = {025024},\n  title = {{Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2}},\n  url = {https://iopscience.iop.org/article/10.1088/2632-2153/abe808},\n  volume = {2},\n  year = {2021}\n}\n\n@article{born2021paccmannrl,\n  title = {PaccMann\\textsuperscript{RL}: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning},\n  journal = {iScience},\n  volume = {24},\n  number = {4},\n  pages = {102269},\n  year = {2021},\n  issn = {2589-0042},\n  doi = {https://doi.org/10.1016/j.isci.2021.102269},\n  url = {https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6},\n  author = {Born, Jannis and Manica, Matteo and Oskooei, Ali and Cadow, Joris and Markert, Greta and {Rodr{\\'{i}}guez Mart{\\'{i}}nez}, Mar{\\'{i}}a}\n}\n\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaccmann%2Fpaccmann_omics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaccmann%2Fpaccmann_omics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaccmann%2Fpaccmann_omics/lists"}