{"id":20226657,"url":"https://github.com/augus1999/bayesian-flow-network-for-chemistry","last_synced_at":"2025-04-10T17:08:37.060Z","repository":{"id":250465262,"uuid":"810246651","full_name":"Augus1999/bayesian-flow-network-for-chemistry","owner":"Augus1999","description":"ChemBFN: Bayesian Flow Network Framework for Chemistry Tasks. Developed in Hiroshima University.","archived":false,"fork":false,"pushed_at":"2025-02-16T02:38:49.000Z","size":290,"stargazers_count":20,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-20T08:19:20.286Z","etag":null,"topics":["bayesian-flow-networks","cheminformatics","chemistry","generative-model","machine-learning","molecular-modeling","qsar","qspr","transformer"],"latest_commit_sha":null,"homepage":"https://augus1999.github.io/bayesian-flow-network-for-chemistry/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Augus1999.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-04T10:28:05.000Z","updated_at":"2025-02-16T02:38:52.000Z","dependencies_parsed_at":"2024-12-17T03:27:41.788Z","dependency_job_id":"4cf26118-6249-46ab-a447-5b14d2f845b0","html_url":"https://github.com/Augus1999/bayesian-flow-network-for-chemistry","commit_stats":null,"previous_names":["augus1999/bayesian-flow-network-for-chemistry"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Augus1999%2Fbayesian-flow-network-for-chemistry","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Augus1999%2Fbayesian-flow-network-for-chemistry/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Augus1999%2Fbayesian-flow-network-for-chemistry/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Augus1999%2Fbayesian-flow-network-for-chemistry/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Augus1999","download_url":"https://codeload.github.com/Augus1999/bayesian-flow-network-for-chemistry/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248260762,"owners_count":21074215,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-flow-networks","cheminformatics","chemistry","generative-model","machine-learning","molecular-modeling","qsar","qspr","transformer"],"created_at":"2024-11-14T07:19:33.905Z","updated_at":"2025-04-10T17:08:37.051Z","avatar_url":"https://github.com/Augus1999.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ChemBFN: Bayesian Flow Network for Chemistry\n\n[![DOI](https://zenodo.org/badge/DOI/10.1021/acs.jcim.4c01792.svg)](https://doi.org/10.1021/acs.jcim.4c01792)\n[![arxiv](https://img.shields.io/badge/arXiv-2412.11439-red)](https://arxiv.org/abs/2412.11439)\n\nThis is the repository of the PyTorch implementation of ChemBFN model.\n\n## Features\n\nChemBFN provides the state-of-the-art functionalities of\n* SMILES or SELFIES-based *de novo* molecule generation\n* Protein sequence *de novo* generation\n* Classifier-free guidance conditional generation (single or multi-objective optimisation)\n* Context-guided conditional generation (inpaint)\n* Outstanding out-of-distribution chemical space sampling\n* Fast sampling via ODE solver\n* Molecular property and activity prediction finetuning\n* Reaction yield prediction finetuning\n\nin an all-in-one-model style.\n\n## News\n\n* [30/01/2025] The package `bayesianflow_for_chem` is available on [PyPI](https://pypi.org/project/bayesianflow-for-chem/).\n* [21/01/2025] Our first paper has been accepted by [JCIM](https://pubs.acs.org/doi/10.1021/acs.jcim.4c01792).\n* [17/12/2024] The second paper of out-of-distribution generation is available on [arxiv.org](https://arxiv.org/abs/2412.11439).\n* [31/07/2024] Paper is available on [arxiv.org](https://arxiv.org/abs/2407.20294).\n* [21/07/2024] Paper was submitted to arXiv.\n\n## Install\n\n```bash\n$ pip install -U bayesianflow_for_chem\n```\n\n## Usage\n\nYou can find example scripts in [📁example](./example) folder.\n\n## Pre-trained Model\n\nYou can find pretrained models in [release](https://github.com/Augus1999/bayesian-flow-network-for-chemistry/releases) or on our [🤗Hugging Face model page](https://huggingface.co/suenoomozawa/ChemBFN).\n\n## Dataset Handling\n\nWe provide a Python class [`CSVData`](./bayesianflow_for_chem/data.py) to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.\n\n1. Download your dataset file (e.g., ESOL form [MoleculeNet](https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/delaney-processed.csv)) and split the file:\n```python\n\u003e\u003e\u003e from bayesianflow_for_chem.tool import split_data\n\n\u003e\u003e\u003e split_data(\"delaney-processed.csv\", method=\"scaffold\")\n```\n\n2. Load the split data:\n```python\n\u003e\u003e\u003e from bayesianflow_for_chem.data import smiles2token, collate, CSVData\n\n\u003e\u003e\u003e dataset = CSVData(\"delaney-processed_train.csv\")\n\u003e\u003e\u003e dataset[0]\n{'Compound ID': ['Thiophene'], \n'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], \n'Minimum Degree': ['2'], \n'Molecular Weight': ['84.14299999999999'], \n'Number of H-Bond Donors': ['0'], \n'Number of Rings': ['1'], \n'Number of Rotatable Bonds': ['0'], \n'Polar Surface Area': ['0.0'], \n'measured log solubility in mols per litre': ['-1.33'], \n'smiles': ['c1ccsc1']}\n```\n\n3. Create a mapping function to tokenise the dataset and select values:\n```python\n\u003e\u003e\u003e import torch\n\n\u003e\u003e\u003e def encode(x):\n...   smiles = x[\"smiles\"][0]\n...   value = [float(i) for i in x[\"measured log solubility in mols per litre\"]]\n...   return {\"token\": smiles2token(smiles), \"value\": torch.tensor(value)}\n\n\u003e\u003e\u003e dataset.map(encode)\n\u003e\u003e\u003e dataset[0]\n{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), \n'value': tensor([-1.3300])}\n```\n\n4. Wrap the dataset in \u003cu\u003etorch.utils.data.DataLoader\u003c/u\u003e:\n```python\n\u003e\u003e\u003e dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)\n```\n\n## Cite This Work\n\n```bibtex\n@article{2025chembfn,\n    title={Bayesian Flow Network Framework for Chemistry Tasks},\n    author={Tao, Nianze and Abe, Minori},\n    journal={Journal of Chemical Information and Modeling},\n    volume={65},\n    number={3},\n    pages={1178-1187},\n    year={2025},\n    doi={10.1021/acs.jcim.4c01792},\n}\n```\nOut-of-distribution generation:\n```bibtex\n@misc{2024chembfn_ood,\n    title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, \n    author={Nianze Tao},\n    year={2024},\n    eprint={2412.11439},\n    archivePrefix={arXiv},\n    primaryClass={cs.LG},\n    url={https://arxiv.org/abs/2412.11439}, \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faugus1999%2Fbayesian-flow-network-for-chemistry","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faugus1999%2Fbayesian-flow-network-for-chemistry","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faugus1999%2Fbayesian-flow-network-for-chemistry/lists"}