https://github.com/rose-stl-lab/fs-cap
few-shot compound activity regression
https://github.com/rose-stl-lab/fs-cap
Last synced: 12 months ago
JSON representation
few-shot compound activity regression
- Host: GitHub
- URL: https://github.com/rose-stl-lab/fs-cap
- Owner: Rose-STL-Lab
- Created: 2023-05-04T00:57:24.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-19T18:03:34.000Z (almost 2 years ago)
- Last Synced: 2025-06-13T06:08:02.814Z (12 months ago)
- Language: Python
- Size: 673 KB
- Stars: 13
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Few-Shot Compound Activity Prediction (FS-CAP)
This repository contains code for the few-shot compound activity prediction (FS-CAP) algorithm.
## Docker instructions
First add a trained model file to the same folder as the `Dockerfile` (or use the one provided [here](https://drive.google.com/file/d/1SD8H5j6U7gyZOI_oncZrEzDBrz-z7-Ng/view?usp=sharing), which was trained with 8 context compounds), then run `sudo docker build -t fscap .` to build the container, then run `sudo docker run fscap` along with any command line arguments. For example, `sudo docker run fscap --context_smiles "" --context_activities --query_smiles ""`.
## Requirements
[RDKit](https://www.rdkit.org/docs/Install.html) is required. All code was tested in Python 3.10. The following pip packages are also required:
```
torch
scipy
scikit-learn
numpy
tqdm
```
## Preprocessing
We only provide code to preprocess BindingDB for training, but testing on other datasets using a trained model should be relatively straightforward using the `score_compounds.py` script.
### BindingDB
`preprocess_bindingdb.py` contains code to extract and preprocess data from BindingDB. Calling `python preprocess_bindingdb.py` will load data from `BindingDB_All.tsv` which should be placed in the folder beforehand, and after running it will produce a `bindingdb_data.pickle` file that is ready for training. For the paper, we used `BindingDB_All.tsv` from BindingDB's [Download](https://www.bindingdb.org/rwd/bind/chemsearch/marvin/SDFdownload.jsp?all_download=yes) page, available [here](https://www.bindingdb.org/bind/downloads/BindingDB_All_2022m8.tsv.zip).
## Training
`train.py` contains the main script to train FS-CAP. By default, the model will train with 8 context compound and will save tensorboard logs to the `logs` folder. After training, it will save the model file to `model.pt`. Other model hyperparameters can be found and adjusted in the `config` variable in the `train.py` file.
## Inference
`score_compounds.py` uses the trained model to perform inference on a given set of context and query compounds. The following parameters must be supplied: `--context_smiles` specifies the SMILES strings of the context molecules, separated by semicolons (e.g. `CCC;CCCC;CCCCC`), and `--context_activities` specifies the associated activites in nanomoles/liter (nM) (e.g. `1000;1;10` if the activites are 1000 nM, 1 nM, and 10 nM, respectively). `--query_smiles` specifies the SMILES string(s) of the query molecule(s) (if multiple, separate with semicolons), `--model_file` specifies the path to the trained model (default `model.pt`), and `encoding_dim` specifies the `encoding_dim` parameter used in training (default 512). The script prints to stdout the activity prediction of the query molecule(s) in nM, one prediction per line.