https://github.com/paccmann/chemical_representation_learning_for_toxicity_prediction

Chemical representation learning paper in Digital Discovery
https://github.com/paccmann/chemical_representation_learning_for_toxicity_prediction

chemoinformatics deep-learning property-prediction pytorch qsar toxicity-prediction

Last synced: 2 months ago
JSON representation

Chemical representation learning paper in Digital Discovery

Host: GitHub
URL: https://github.com/paccmann/chemical_representation_learning_for_toxicity_prediction
Owner: PaccMann
License: mit
Created: 2020-09-09T10:33:35.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2024-05-22T20:38:48.000Z (about 1 year ago)
Last Synced: 2025-03-24T16:53:10.669Z (3 months ago)
Topics: chemoinformatics, deep-learning, property-prediction, pytorch, qsar, toxicity-prediction
Language: Jupyter Notebook
Homepage: https://pubs.rsc.org/en/content/articlelanding/2023/dd/d2dd00099g
Size: 975 KB
Stars: 59
Watchers: 6
Forks: 16
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        [![Build Status](https://github.com/PaccMann/toxsmi/actions/workflows/build.yml/badge.svg)](https://github.com/PaccMann/toxsmi/actions/workflows/build.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[![Gradio demo](https://img.shields.io/website-up-down-green-red/https/hf.space/gradioiframe/GT4SD/molecular_properties/+.svg?label=demo%20status)](https://huggingface.co/spaces/GT4SD/molecular_properties)

## Chemical Representation Learning for Toxicity Prediction

PyTorch implementation related to the paper *Chemical Representation Learning for Toxicity Prediction* ([Born et al, 2023, *Digital Discovery*](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g)).

# Inference

We released pretrained models for the Tox21, the ClinTox and the SIDER dataset.

## Demo with UI

🤗 A gradio demo with a simple UI is available on [HuggingFace spaces](https://huggingface.co/spaces/GT4SD/molecular_properties)

![Summary](assets/demo.png)

## Python API

The pretrained models are available via the [GT4SD](https://github.com/GT4SD), the Generative Toolkit for Scientific Discovery. See the paper [here](https://arxiv.org/abs/2207.03928).

We recommend to use [GT4SD](https://github.com/GT4SD/gt4sd-core) for inference. Once you install that library, use as follows:

```py

from gt4sd.properties import PropertyPredictorRegistry

tox21 = PropertyPredictorRegistry.get_property_predictor('tox21', {'algorithm_version': 'v0'})

tox21('CCO')

```

The other models are the SIDER model and the ClinTox model from the [MoleculeNet](https://moleculenet.org/datasets-1) benchmark:

```py

from gt4sd.properties import PropertyPredictorRegistry

sider = PropertyPredictorRegistry.get_property_predictor('sider', {'algorithm_version': 'v0'})

clintox = PropertyPredictorRegistry.get_property_predictor('clintox', {'algorithm_version': 'v0'})

print(f"SIDE effect predictions: {sider('CCO')}")

print(f"Clinical toxicitiy predictions: {clintox('CCO')}")

```

# Training your own model

### Setup

The library itself has few dependencies (see [setup.py](setup.py)) with loose requirements. 

```sh

pip install -e .

```

### Start a training

In the `scripts` directory is a training script [train_tox](./scripts/train_tox).

Download sample data from the Tox21 database and store it in a folder called `data`

[here](https://ibm.box.com/s/kahxnlg2k2s0x3z0r5fa6y67tmfhs6or). 

```console

(toxsmi) $ python3 scripts/train_tox \

--train data/tox21_train.csv \

--test data/tox21_score.csv \

--smi data/tox21.smi \

--params params/mca.json \

--model path_to_model_folder \

--name debug

```

**Features**:

- Set ```--finetune``` to the path to a `.pt` file to start from a pretrained model

- Set ```--embedding_path``` to the path of pretrained embeddings

Type `python scripts/train_tox -h` for further help.

### Evaluate a model

In the `scripts` directory is an evaluation script [eval_tox.py](./scripts/eval_tox.py).

Assume you have a trained model, use as follows:

```console

(toxsmi) $ python3 scripts/eval_tox.py \

-model path_to_model_folder \

-smi data/tox21.smi \

-labels data/tox21_test.csv \

-checkpoint RMSE"

```

where `-checkpoint` specifies which `.pt` file to pick for the evaluation (based on substring matching)

## Attention visualization

The model uses a self-attention mechanism that can highlight chemical motifs used for the predictions.

In [notebooks/toxicity_attention_plot.ipynb](notebooks/toxicity_attention_plot.ipynb) we share a tutorial on how to create such plots:

![Attention](assets/attention.gif "toxicophore attention")

## Citation

If you use this code in your projects, please cite the following:

```bib

@article{born2023chemical,

    author = {Born, Jannis and Markert, Greta and Janakarajan, Nikita and Kimber, Talia B. and Volkamer, Andrea and Martínez, María Rodríguez and Manica, Matteo},

    title = {Chemical representation learning for toxicity prediction},

    journal = {Digital Discovery},

    year = {2023},

    pages = {-},

    publisher = {RSC},

    doi = {10.1039/D2DD00099G},

    url = {http://dx.doi.org/10.1039/D2DD00099G}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/paccmann/chemical_representation_learning_for_toxicity_prediction

Awesome Lists containing this project

README