Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dmis-lab/molpla
https://github.com/dmis-lab/molpla
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/dmis-lab/molpla
- Owner: dmis-lab
- Created: 2024-02-08T02:02:58.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-03-13T06:57:54.000Z (10 months ago)
- Last Synced: 2024-05-14T00:23:19.824Z (8 months ago)
- Language: Python
- Size: 570 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MolPLA
![img](./figures/molpla_model.jpg)
## Abstract (submitted to ISMB 2024 Proceedings Track)
Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts in molecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios. Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates.
## How to run the experiments
### Step 1. Edit the confguration file **settings.yaml**.
```
example:
dev_mode:
debugging:
toy_test:wandb:
project_name: example_project_name
session_name: example_session_name
group_name:ddp:
port: 13000path:
dataset: /path/to/folder/named/datasets
checkpoint: /path/to/folder/named/checkpointsdataprep:
dataset: geom
version: v11
subsample: 1.0experiment:
testing_mode: false
random_seed: 911012
which_best: lossmodel_params:
model_type: molpla
hidden_dim: 300
dropout_rate: 0.0
graph_encoder: GNN
gnn_params:
aggr: add
JK: concat
gnn_type: gin
num_layer: 3graph_pooling: add
graph_projector: mlp
link_decoder: mlpstop_gradient_arms: False
stop_gradient_core: False
separate_linker_nodes: Falseprop_conditioned: arms
faiss_metric: inner_product
train_params:
batch_size: 4096
num_epochs: 200optimizer: adam
scheduler: CyclicLRlearning_rate: 0.00001
weight_decay: 0.0early_stopping: loss
early_patience: 30pretraining:
main_graph_contrastive:
loss_coef: 0.1
score_func: dualentropy
tau: 0.1
dcpd_graph_contrastive:
loss_coef: 0.1
score_func: dualentropy
tau: 0.05
linker_node_contrastive:
loss_coef: 0.8
score_func: dualentropy
tau: 0.01example_bench:
dataprep:
dataset:
version:
subsample:experiment:
testing_mode: false
random_seed: 8888
which_best: lossmodel_params:
dropout_rate: 0.1train_params:
batch_size: 256
num_epochs: 100optimizer: adam
scheduler: dummylearning_rate: 0.0001
weight_decay: 0.0early_stopping:
early_patience: 100finetuning:
from_pretrained: pretrained_geom_v11
freeze_pretrained: False
```- Possible arguments for
- **example.model_params.model_type**: ```molpla```
- **example.train_params.scheduler**: ```dummy```, ```CyclicLR```
- **example.train_params.pretraining.linker_node_contrastive**: ```dualentropy```
- All experiment reports are uploaded to your WANDB account.
- You can download the datasets from our Google Drive. Current version is ```v11```.### Step 2. Run the following script
```
python run.py -sn main -mg {GPU indices separated by comma}```
- This script will pretrain the molecule representation model and then perform benchmark experiments (finetune-and-test) on various molecule property prediction datasets including *freesolv*, *lipophilicity*, *esol*, *toxcast*, *tox21*, *sider*, *bbbp*, *bace* and *clintox*.
- If you want to skip the pretraining phase, add *-sp* to the above script.
- If you want to run only the pretraining code to either adjust the hyperparameters or look into the **R-Group Retrieval Task**, run this code instead.```
python run_pretrain.py -sn example -mg {GPU indices separated by comma}```
### Downloading the Preprocessed Dataset and Pretrained Model
- The dataset contains all pre-processed data that was used to pre-train MoLPLA and perform benchmark test on molecule property prediction. [GOOGLE DRIVE DOWNLOAD LINK](https://drive.google.com/file/d/1sgWVvZ3ln56D9GP7u5VoUhoP4MD0IQTR/view?usp=sharing)
- This repository in Google Drive contains all the files including the model checkpoints containing pre-trained parameters. Note that you might have to edit the directory configuration inside **model_config.pkl**. [GOOGLE DRIVE DOWNLOAD LINK](https://drive.google.com/drive/folders/1fEtaPKuwDihHAprxQgWg2eKPXtNCi5xv?usp=drive_link)## Contributors
Name
Affiliation
Mogan Gim†
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Jueon Park†
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Soyon Park
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Sanghoon Lee
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Seungheun Baek
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Junhyun Lee
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Ngoc-Quang Nguyen
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
Jaewoo Kang*
Data Mining and Information Systems Lab,
Korea University, Seoul, South Korea
[email protected]
- †: *Equal Contributors*
- *: *Corresponding Author*