https://github.com/paccmann/paccmann_omics
Generative models for transcriptomics profiles and proteins
https://github.com/paccmann/paccmann_omics
deep-learning generative-model proteomics transcriptomics vae variational-autoencoder
Last synced: 10 months ago
JSON representation
Generative models for transcriptomics profiles and proteins
- Host: GitHub
- URL: https://github.com/paccmann/paccmann_omics
- Owner: PaccMann
- License: mit
- Created: 2019-11-02T09:15:06.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-09-17T23:24:24.000Z (over 4 years ago)
- Last Synced: 2025-03-24T16:53:29.070Z (11 months ago)
- Topics: deep-learning, generative-model, proteomics, transcriptomics, vae, variational-autoencoder
- Language: Python
- Homepage:
- Size: 54.7 KB
- Stars: 8
- Watchers: 3
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/PaccMann/paccmann_omics/actions/workflows/build.yml)
# paccmann_omics
Generative models of omic data for PaccMannRL.
`paccmann_omics` is a package to model omic data, with examples for generative
models of gene expression profiles and encoded proteins (vector representations).
For example, see our papers:
- [_PaccMannRL: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning_](https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6) (_iScience_, 2021). In there, we use a denoising, dense VAE to model gene expression profiles from TCGA (code in this repo). We then use these encodings to conditionally generate de novo molecules with high predicted efficacy against these cell types.
- [Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2](https://iopscience.iop.org/article/10.1088/2632-2153/abe808) (_Machine Learning: Science and Technology_, 2021). In there, we use a denoising, dense VAE to model proteins from UniProt (code in this repo). We then use a set of 41 SARS-CoV-2 related proteins to conditionally generate de novo molecules with high predicted binding affinity against these proteins.
## Requirements
- `conda>=3.7`
## Installation
The library itself has few dependencies (see [setup.py](setup.py)) with loose requirements.
To run the example training script we provide environment files under `examples/`.
Create a conda environment:
```sh
conda env create -f examples/gene_expression/conda.yml
```
Activate the environment:
```sh
conda activate paccmann_omics
```
Install in editable mode for development:
```sh
pip install -e .
```
## Example usage
In the `examples` directory is a training script `train_vae.py` that makes use
of paccmann_omics.
```console
(paccmann_omics) $ python examples/gene_expression/train_vae.py -h
usage: train_vae.py [-h]
train_filepath val_filepath gene_filepath model_path
params_filepath training_name
Omics VAE training script.
positional arguments:
train_filepath Path to the training data (.csv).
val_filepath Path to the validation data (.csv).
gene_filepath Path to a pickle object containing list of genes.
model_path Directory where the model will be stored.
params_filepath Path to the parameter file.
training_name Name for the training.
optional arguments:
-h, --help show this help message and exit
```
`params_filepath` could point to [examples/gene_expression/example_params.json](examples/gene_expression/example_params.json), examples for other files can be downloaded from [here](https://ibm.box.com/v/paccmann-pytoda-data).
## References
If you use `paccmann_omics` in your projects, please cite the following:
```bib
@article{born2021datadriven,
author = {Born, Jannis and Manica, Matteo and Cadow, Joris and Markert, Greta and Mill, Nil Adell and Filipavicius, Modestas and Janakarajan, Nikita and Cardinale, Antonio and Laino, Teodoro and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a},
doi = {10.1088/2632-2153/abe808},
issn = {2632-2153},
journal = {Machine Learning: Science and Technology},
number = {2},
pages = {025024},
title = {{Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2}},
url = {https://iopscience.iop.org/article/10.1088/2632-2153/abe808},
volume = {2},
year = {2021}
}
@article{born2021paccmannrl,
title = {PaccMann\textsuperscript{RL}: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning},
journal = {iScience},
volume = {24},
number = {4},
pages = {102269},
year = {2021},
issn = {2589-0042},
doi = {https://doi.org/10.1016/j.isci.2021.102269},
url = {https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6},
author = {Born, Jannis and Manica, Matteo and Oskooei, Ali and Cadow, Joris and Markert, Greta and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a}
}
```