https://github.com/augus1999/bayesian-flow-network-for-chemistry
ChemBFN: Bayesian Flow Network Framework for Chemistry Tasks. Developed in Hiroshima University.
https://github.com/augus1999/bayesian-flow-network-for-chemistry
bayesian-flow-networks cheminformatics chemistry generative-model machine-learning molecular-modeling qsar qspr transformer
Last synced: about 1 year ago
JSON representation
ChemBFN: Bayesian Flow Network Framework for Chemistry Tasks. Developed in Hiroshima University.
- Host: GitHub
- URL: https://github.com/augus1999/bayesian-flow-network-for-chemistry
- Owner: Augus1999
- License: agpl-3.0
- Created: 2024-06-04T10:28:05.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-16T02:38:49.000Z (over 1 year ago)
- Last Synced: 2025-03-20T08:19:20.286Z (about 1 year ago)
- Topics: bayesian-flow-networks, cheminformatics, chemistry, generative-model, machine-learning, molecular-modeling, qsar, qspr, transformer
- Language: Python
- Homepage: https://augus1999.github.io/bayesian-flow-network-for-chemistry/
- Size: 283 KB
- Stars: 20
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ChemBFN: Bayesian Flow Network for Chemistry
[](https://doi.org/10.1021/acs.jcim.4c01792)
[](https://arxiv.org/abs/2412.11439)
This is the repository of the PyTorch implementation of ChemBFN model.
## Features
ChemBFN provides the state-of-the-art functionalities of
* SMILES or SELFIES-based *de novo* molecule generation
* Protein sequence *de novo* generation
* Classifier-free guidance conditional generation (single or multi-objective optimisation)
* Context-guided conditional generation (inpaint)
* Outstanding out-of-distribution chemical space sampling
* Fast sampling via ODE solver
* Molecular property and activity prediction finetuning
* Reaction yield prediction finetuning
in an all-in-one-model style.
## News
* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).
* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).
* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).
* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).
* [21/07/2024] Paper was submitted to arXiv.
## Install
```bash
$ pip install -U bayesianflow_for_chem
```
## Usage
You can find example scripts in [ðexample](./example) folder.
## Pre-trained Model
You can find pretrained models in [release](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/releases) or on our [ð€Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).
## Dataset Handling
We provide a Python class [`CSVData`](./bayesianflow_for_chem/data.py) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.
1. Download your dataset file (e.g., ESOL form [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:
```python
>>> from bayesianflow_for_chem.tool import split_data
>>> split_data("delaney-processed.csv", method="scaffold")
```
2. Load the split data:
```python
>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData
>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'],
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'],
'Minimum Degree': ['2'],
'Molecular Weight': ['84.14299999999999'],
'Number of H-Bond Donors': ['0'],
'Number of Rings': ['1'],
'Number of Rotatable Bonds': ['0'],
'Polar Surface Area': ['0.0'],
'measured log solubility in mols per litre': ['-1.33'],
'smiles': ['c1ccsc1']}
```
3. Create a mapping function to tokenise the dataset and select values:
```python
>>> import torch
>>> def encode(x):
... smiles = x["smiles"][0]
... value = [float(i) for i in x["measured log solubility in mols per litre"]]
... return {"token": smiles2token(smiles), "value": torch.tensor(value)}
>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([ 1, 151, 23, 151, 151, 154, 151, 23, 2]),
'value': tensor([-1.3300])}
```
4. Wrap the dataset in torch.utils.data.DataLoader:
```python
>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)
```
## Cite This Work
```bibtex
@article{2025chembfn,
title={Bayesian Flow Network Framework for Chemistry Tasks},
author={Tao, Nianze and Abe, Minori},
journal={Journal of Chemical Information and Modeling},
volume={65},
number={3},
pages={1178-1187},
year={2025},
doi={10.1021/acs.jcim.4c01792},
}
```
Out-of-distribution generation:
```bibtex
@misc{2024chembfn_ood,
title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces},
author={Nianze Tao},
year={2024},
eprint={2412.11439},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.11439},
}
```