{"id":22582444,"url":"https://github.com/paccmann/paccmann_rl","last_synced_at":"2025-03-28T16:44:07.970Z","repository":{"id":47001237,"uuid":"219799547","full_name":"PaccMann/paccmann_rl","owner":"PaccMann","description":"Code pipeline for the PaccMann^RL in iScience: https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6","archived":false,"fork":false,"pushed_at":"2022-02-10T15:51:38.000Z","size":32,"stargazers_count":32,"open_issues_count":1,"forks_count":9,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-02-02T17:21:16.661Z","etag":null,"topics":["de-novo-drug-design","deep-learning","drug-discovery","drug-sensitivity","generative-models","transcriptomics"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PaccMann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-05T17:06:51.000Z","updated_at":"2025-01-17T11:46:45.000Z","dependencies_parsed_at":"2022-08-26T10:40:40.673Z","dependency_job_id":null,"html_url":"https://github.com/PaccMann/paccmann_rl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaccMann%2Fpaccmann_rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PaccMann","download_url":"https://codeload.github.com/PaccMann/paccmann_rl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246068283,"owners_count":20718501,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["de-novo-drug-design","deep-learning","drug-discovery","drug-sensitivity","generative-models","transcriptomics"],"created_at":"2024-12-08T06:09:57.344Z","updated_at":"2025-03-28T16:44:07.930Z","avatar_url":"https://github.com/PaccMann.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://github.com/PaccMann/paccmann_rl/actions/workflows/build.yml/badge.svg)](https://github.com/PaccMann/paccmann_rl/actions/workflows/build.yml)\n# paccmann_rl\n\nPipeline to reproduce the results of the [PaccMann\u003csup\u003eRL\u003c/sup\u003e paper](https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6) published in _iScience_.\n\n## Description\n\nIn the repo we provide a conda environment and instructions to reproduce the pipeline described in the manuscript:\n\n1. Train a multimodal drug sensitivity predictor ([source code](https://github.com/PaccMann/paccmann_predictor))\n2. Train a generative model for omic profiles, also known as the PVAE ([source code](https://github.com/PaccMann/paccmann_omics))\n3. Train a generative model for molecules, also known as the SVAE ([source code](https://github.com/PaccMann/paccmann_chemistry))\n4. Train PaccMann^RL ([source code](https://github.com/PaccMann/paccmann_generator))\n\n## Requirements\n\n- `conda\u003e=3.7`\n- The following data from [here](https://ibm.ent.box.com/v/paccmann-pytoda-data):\n  - The processed splitted data from the folder `splitted_data`\n  - The processed gene expression data from [GDSC](https://www.cancerrxgene.org/): `data/gene_expression/gdsc-rnaseq_gene-expression.csv`\n  - The processed SMILES from the drugs from [GDSC](https://www.cancerrxgene.org/): `data/smiles/gdsc.smi`\n  - A pickled [SMILESLanguage](https://github.com/PaccMann/paccmann_datasets/blob/master/pytoda/smiles/smiles_language.py) object (`data/smiles_language_chembl_gdsc_ccle.pkl`)\n  - A pickled list of genes representing the panel considered in the paper (`data/2128_genes.pkl`)\n  - A pickled pandas DataFrame containing expression values and metadata for the cell lines considered in the paper (`data/gdsc_transcriptomics_for_conditional_generation.pkl`)\n- The git repos linked in the [previous section](#description)\n\n**NOTE:** please refer to the [README.md](https://ibm.ent.box.com/v/paccmann-pytoda-data/file/548614344106) and to the manuscript for details on the datasets used and the preprocessing applied.\n\n## Setup\n\n### Install the environment\n\nCreate a conda environment:\n\n```sh\nconda env create -f conda.yml\n```\n\nActivate the environment:\n\n```sh\nconda activate paccmann_rl\n```\n\n### Download data\n\nDownload the data reported in the [requirements section](#requirements).\nFrom now on, we will assume that they are stored in the root of the repository in a folder called `data`, following this structure:\n\n```console\ndata\n├── 2128_genes.pkl\n├── gdsc-rnaseq_gene-expression.csv\n├── gdsc.smi\n├── gdsc_transcriptomics_for_conditional_generation.pkl\n├── smiles_language_chembl_gdsc_ccle.pkl\n└── splitted_data\n    ├── gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv\n    ├── gdsc_cell_line_ic50_train_fraction_0.9_id_997_seed_42.csv\n    ├── tcga_rnaseq_test_fraction_0.1_id_242870585127480531622270373503581547167_seed_42.csv\n    ├── tcga_rnaseq_train_fraction_0.9_id_242870585127480531622270373503581547167_seed_42.csv\n    ├── test_chembl_22_clean_1576904_sorted_std_final.smi\n    └── train_chembl_22_clean_1576904_sorted_std_final.smi\n\n1 directory, 11 files\n```\n\n**NOTE:** no worries, the `data` folder is in the [.gitignore](./.gitignore).\n\n### Clone the repos\n\nTo get the scripts to run each of the component create a `code` folder and clone the repos. Simply type this:\n\n```sh\nmkdir code \u0026\u0026 cd code \u0026\u0026 \\\n  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_predictor \u0026\u0026 \\ \n  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_omics \u0026\u0026 \\ \n  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_chemistry \u0026\u0026 \\ \n  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_generator \u0026\u0026 \\\n  cd ..\n```\n\n**NOTE:** no worries, the `code` folder is in the [.gitignore](./.gitignore).\n\n## Pipeline\n\nNow it's all set to run the full pipeline.\n\n**NOTE:** the workload required to run the full pipeline is intesive and might not be straightforward to run all the steps on a desktop laptop. For this reason, we also provide [pretrained models](https://ibm.ent.box.com/v/paccmann-pytoda-data/folder/91897885403) that can be downloaded and used to run the different steps.\n\n**NOTE:** in the following, we assume a folder `models` has been created in the root of the repository. No worries, the `models` folder is in the [.gitignore](./.gitignore).\n\n### Multimodal drug sensitivity predictor\n\n```console\n(paccmann_rl) $ python ./code/paccmann_predictor/examples/train_paccmann.py \\\n    ./data/splitted_data/gdsc_cell_line_ic50_train_fraction_0.9_id_997_seed_42.csv \\\n    ./data/splitted_data/gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv \\\n    ./data/gdsc-rnaseq_gene-expression.csv \\\n    ./data/gdsc.smi \\\n    ./data/2128_genes.pkl \\\n    ./data/smiles_language_chembl_gdsc_ccle.pkl \\\n    ./models/ \\\n    ./code/paccmann_predictor/examples/example_params.json paccmann\n```\n\n### PVAE\n\n``` console\n(paccmann_rl) $ python ./code/paccmann_omics/examples/train_vae.py \\\n    ./data/splitted_data/tcga_rnaseq_train_fraction_0.9_id_242870585127480531622270373503581547167_seed_42.csv \\\n    ./data/splitted_data/tcga_rnaseq_test_fraction_0.1_id_242870585127480531622270373503581547167_seed_42.csv \\\n    ./data/2128_genes.pkl \\\n    ./models/ \\\n    ./code/paccmann_omics/examples/example_params.json pvae\n```\n\n### SVAE\n\n``` console\n(paccmann_rl) $ python ./code/paccmann_chemistry/examples/train_vae.py \\\n    ./data/splitted_data/train_chembl_22_clean_1576904_sorted_std_final.smi \\\n    ./data/splitted_data/test_chembl_22_clean_1576904_sorted_std_final.smi \\\n    ./data/smiles_language_chembl_gdsc_ccle.pkl \\\n    ./models/ \\\n    ./code/paccmann_chemistry/examples/example_params.json svae\n```\n\n### PaccMann^RL\n\n``` console\n(paccmann_rl) $ python ./code/paccmann_generator/examples/train_paccmann_rl.py \\\n    ./models/svae \\\n    ./models/pvae \\\n    ./models/paccmann \\\n    ./data/smiles_language_chembl_gdsc_ccle.pkl \\\n    ./data/gdsc_transcriptomics_for_conditional_generation.pkl \\\n    ./code/paccmann_generator/examples/example_params.json \\\n    paccmann_rl breast\n```\n\n**NOTE:** this will create a `biased_model` folder containing the conditional generator and the baseline SMILES generator used. In this case: `breast_paccmann_rl` and `baseline`. No worries, the `biased_models` folder is in the [.gitignore](./.gitignore).\n\n## References\n\nIf you use `paccmann_rl` in your projects, please cite the following:\n\n```bib\n@article{born2021paccmannrl,\n  title = {PaccMann\\textsuperscript{RL}: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning},\n  journal = {iScience},\n  volume = {24},\n  number = {4},\n  pages = {102269},\n  year = {2021},\n  issn = {2589-0042},\n  doi = {https://doi.org/10.1016/j.isci.2021.102269},\n  url = {https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6},\n  author = {Born, Jannis and Manica, Matteo and Oskooei, Ali and Cadow, Joris and Markert, Greta and {Rodr{\\'{i}}guez Mart{\\'{i}}nez}, Mar{\\'{i}}a}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaccmann%2Fpaccmann_rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaccmann%2Fpaccmann_rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaccmann%2Fpaccmann_rl/lists"}