https://github.com/rose-stl-lab/fs-cap

few-shot compound activity regression
https://github.com/rose-stl-lab/fs-cap

Last synced: about 1 year ago
JSON representation

few-shot compound activity regression

Host: GitHub
URL: https://github.com/rose-stl-lab/fs-cap
Owner: Rose-STL-Lab
Created: 2023-05-04T00:57:24.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-08-19T18:03:34.000Z (almost 2 years ago)
Last Synced: 2025-06-13T06:08:02.814Z (about 1 year ago)
Language: Python
Size: 673 KB
Stars: 13
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Few-Shot Compound Activity Prediction (FS-CAP)

This repository contains code for the few-shot compound activity prediction (FS-CAP) algorithm.

## Docker instructions
First add a trained model file to the same folder as the `Dockerfile` (or use the one provided [here](https://drive.google.com/file/d/1SD8H5j6U7gyZOI_oncZrEzDBrz-z7-Ng/view?usp=sharing), which was trained with 8 context compounds), then run `sudo docker build -t fscap .` to build the container, then run `sudo docker run fscap` along with any command line arguments. For example, `sudo docker run fscap --context_smiles "" --context_activities --query_smiles ""`.

## Requirements
[RDKit](https://www.rdkit.org/docs/Install.html) is required. All code was tested in Python 3.10. The following pip packages are also required:
```
torch
scipy
scikit-learn
numpy
tqdm
```

## Preprocessing
We only provide code to preprocess BindingDB for training, but testing on other datasets using a trained model should be relatively straightforward using the `score_compounds.py` script.

### BindingDB
`preprocess_bindingdb.py` contains code to extract and preprocess data from BindingDB. Calling `python preprocess_bindingdb.py` will load data from `BindingDB_All.tsv` which should be placed in the folder beforehand, and after running it will produce a `bindingdb_data.pickle` file that is ready for training. For the paper, we used `BindingDB_All.tsv` from BindingDB's [Download](https://www.bindingdb.org/rwd/bind/chemsearch/marvin/SDFdownload.jsp?all_download=yes) page, available [here](https://www.bindingdb.org/bind/downloads/BindingDB_All_2022m8.tsv.zip).

## Training
`train.py` contains the main script to train FS-CAP. By default, the model will train with 8 context compound and will save tensorboard logs to the `logs` folder. After training, it will save the model file to `model.pt`. Other model hyperparameters can be found and adjusted in the `config` variable in the `train.py` file.

## Inference
`score_compounds.py` uses the trained model to perform inference on a given set of context and query compounds. The following parameters must be supplied: `--context_smiles` specifies the SMILES strings of the context molecules, separated by semicolons (e.g. `CCC;CCCC;CCCCC`), and `--context_activities` specifies the associated activites in nanomoles/liter (nM) (e.g. `1000;1;10` if the activites are 1000 nM, 1 nM, and 10 nM, respectively). `--query_smiles` specifies the SMILES string(s) of the query molecule(s) (if multiple, separate with semicolons), `--model_file` specifies the path to the trained model (default `model.pt`), and `encoding_dim` specifies the `encoding_dim` parameter used in training (default 512). The script prints to stdout the activity prediction of the query molecule(s) in nM, one prediction per line.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rose-stl-lab/fs-cap

Awesome Lists containing this project

README