https://github.com/qizhipei/biot5
BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)
https://github.com/qizhipei/biot5
bioinformatics computational-biology cross-modal machine-learning nlp nlp-applications
Last synced: 4 months ago
JSON representation
BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)
- Host: GitHub
- URL: https://github.com/qizhipei/biot5
- Owner: QizhiPei
- License: mit
- Created: 2023-10-11T09:00:33.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-14T14:01:09.000Z (almost 2 years ago)
- Last Synced: 2025-12-06T20:21:45.221Z (7 months ago)
- Topics: bioinformatics, computational-biology, cross-modal, machine-learning, nlp, nlp-applications
- Language: Python
- Homepage: https://arxiv.org/abs/2310.07276
- Size: 1.75 MB
- Stars: 122
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations 🔥
[](https://arxiv.org/abs/2310.07276)
[](https://arxiv.org/abs/2402.17810)
[](https://github.com/QizhiPei/BioT5)
[](https://huggingface.co/QizhiPei/biot5-base)
[](https://huggingface.co/datasets/QizhiPei/BioT5_finetune_dataset)
[](https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling)
[](https://pytorch.org/get-started/locally/)
## News
🎉***July 18 2024***: *Happy to share that our [enhanced version of BioT5+](https://openreview.net/forum?id=Fib0IJt8YW) ranked **1st** place in the Text-based Molecule Generation track and **2nd** place in the Molecular Captioning Track at [Language + Molecule @ ACL2024 Competition](https://language-plus-molecules.github.io/#leaderboard)*
🔥***July 11 2024***: *Data, codes, and pre-trained models for BioT5+ are relased.*
🔥***May 16 2024***: *[BioT5+](https://arxiv.org/abs/2402.17810) is accepted by ACL 2024 (Findings).*
🔥***Mar 03 2024***: *We have published a suvery paper [Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey](https://arxiv.org/abs/2403.01528) and the related github repository [Awesome-Biomolecule-Language-Cross-Modeling](https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling). Kindly check it if you are interested in this field~*
🔥***Feb 29 2024***: *Update [BioT5](https://arxiv.org/abs/2310.07276) to [BioT5+](https://arxiv.org/abs/2402.17810) with the ability of IUPAC integration and multi-task learning!*
🔥***Nov 06 2023***: *Update [example usage](#example-usage) for molecule captioning, text-based molecule generation, drug-target interaction prediction!*
🔥***Oct 20 2023***: *The [data](#data) for fine-tuning is released!*
🔥***Oct 19 2023***: *The pre-trained and fine-tuned [models](#models) are released!*
🔥***Oct 11 2023***: *Initial commits. More codes, pre-trained model, and data are coming soon.*
## Overview
This repository contains the source code for
* *EMNLP 2023* paper "[BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations](https://arxiv.org/abs/2310.07276)", by Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. BioT5 achieves superior performance on various biological tasks.
* *ACL 2024 (Findings)* paper "[BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning](https://arxiv.org/abs/2402.17810)", by Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan. BioT5+ is pre-trained and fine-tuned with a large number of experiments, including **3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets**, demonstrating the remarkable performance and state-of-the-art results in most cases.
* If you have questions, don't hesitate to open an issue or ask me via or Lijun Wu via . We are happy to hear from you!
**↓Overview of BioT5**

**↓Overview of BioT5+**

**Please refer to the `biot5` or `biot5_plus` folder for detailed instructions.**
## Citations
### BioT5
```
@inproceedings{pei2023biot5,
title={BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations},
author={Pei, Qizhi and Zhang, Wei and Zhu, Jinhua and Wu, Kehan and Gao, Kaiyuan and Wu, Lijun and Xia, Yingce and Yan, Rui},
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.70",
pages = "1102--1123"
}
```
### BioT5+
```
@article{pei2024biot5+,
title={BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning},
author={Pei, Qizhi and Wu, Lijun and Gao, Kaiyuan and Liang, Xiaozhuan and Fang, Yin and Zhu, Jinhua and Xie, Shufang and Qin, Tao and Yan, Rui},
journal={arXiv preprint arXiv:2402.17810},
year={2024}
}
```
## Acknowledegments
The code is based on [nanoT5](https://github.com/PiotrNawrot/nanoT5).