{"id":45952790,"url":"https://github.com/qizhipei/biot5","last_synced_at":"2026-02-28T13:01:51.004Z","repository":{"id":199756136,"uuid":"703445100","full_name":"QizhiPei/BioT5","owner":"QizhiPei","description":"BioT5 (EMNLP 2023) and BioT5+ (ACL 2024 Findings)","archived":false,"fork":false,"pushed_at":"2024-09-14T14:01:09.000Z","size":1838,"stargazers_count":122,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-12-06T20:21:45.221Z","etag":null,"topics":["bioinformatics","computational-biology","cross-modal","machine-learning","nlp","nlp-applications"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2310.07276","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QizhiPei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-11T09:00:33.000Z","updated_at":"2025-11-21T09:29:45.000Z","dependencies_parsed_at":"2023-10-12T05:27:07.056Z","dependency_job_id":"c776eaa3-f30f-4e07-80ca-75631919f0f2","html_url":"https://github.com/QizhiPei/BioT5","commit_stats":{"total_commits":27,"total_committers":2,"mean_commits":13.5,"dds":0.2962962962962963,"last_synced_commit":"e088e6d3ab6f17487af6188cee7d4e68344c8b3c"},"previous_names":["qizhipei/biot5"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/QizhiPei/BioT5","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FBioT5","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FBioT5/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FBioT5/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FBioT5/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QizhiPei","download_url":"https://codeload.github.com/QizhiPei/BioT5/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FBioT5/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29934956,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T13:00:17.143Z","status":"ssl_error","status_checked_at":"2026-02-28T12:59:13.669Z","response_time":90,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","computational-biology","cross-modal","machine-learning","nlp","nlp-applications"],"created_at":"2026-02-28T13:01:48.899Z","updated_at":"2026-02-28T13:01:50.986Z","avatar_url":"https://github.com/QizhiPei.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\nBioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations 🔥\n\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![](https://img.shields.io/badge/BioT5-arxiv2310.07276-red?style=plastic\u0026logo=GitBook)](https://arxiv.org/abs/2310.07276)\n[![](https://img.shields.io/badge/BioT5+-arxiv2402.17810-red?style=plastic\u0026logo=GitBook)](https://arxiv.org/abs/2402.17810)\n[![](https://img.shields.io/badge/github-green?style=plastic\u0026logo=github)](https://github.com/QizhiPei/BioT5) \n[![](https://img.shields.io/badge/model-pink?style=plastic\u0026logo=themodelsresource)](https://huggingface.co/QizhiPei/biot5-base) \n[![](https://img.shields.io/badge/dataset-orange?style=plastic\u0026logo=data.ai)](https://huggingface.co/datasets/QizhiPei/BioT5_finetune_dataset)\n[![](https://img.shields.io/badge/Awesome-Biomolecule_Language_Cross_Modeling-orange?style=plastic\u0026logo=awesomelists)](https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling)\n[![](https://img.shields.io/badge/PyTorch-1.13+-ee4c2c?logo=pytorch\u0026logoColor=white)](https://pytorch.org/get-started/locally/)\n\n\u003c/div\u003e\n\n\n## News\n🎉***July 18 2024***: *Happy to share that our [enhanced version of BioT5+](https://openreview.net/forum?id=Fib0IJt8YW) ranked **1st** place in the Text-based Molecule Generation track and **2nd** place in the Molecular Captioning Track at [Language + Molecule @ ACL2024 Competition](https://language-plus-molecules.github.io/#leaderboard)*\n\n🔥***July 11 2024***: *Data, codes, and pre-trained models for BioT5+ are relased.*\n\n🔥***May 16 2024***: *[BioT5+](https://arxiv.org/abs/2402.17810) is accepted by ACL 2024 (Findings).*\n\n🔥***Mar 03 2024***: *We have published a suvery paper [Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey](https://arxiv.org/abs/2403.01528) and the related github repository [Awesome-Biomolecule-Language-Cross-Modeling](https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling). Kindly check it if you are interested in this field~*\n\n🔥***Feb 29 2024***: *Update [BioT5](https://arxiv.org/abs/2310.07276) to [BioT5+](https://arxiv.org/abs/2402.17810) with the ability of IUPAC integration and multi-task learning!*\n\n🔥***Nov 06 2023***: *Update [example usage](#example-usage) for molecule captioning, text-based molecule generation, drug-target interaction prediction!*\n\n🔥***Oct 20 2023***: *The [data](#data) for fine-tuning is released!*\n\n🔥***Oct 19 2023***: *The pre-trained and fine-tuned [models](#models) are released!*\n\n🔥***Oct 11 2023***: *Initial commits. More codes, pre-trained model, and data are coming soon.*\n\n\n## Overview\nThis repository contains the source code for \n\n* *EMNLP 2023* paper \"[BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations](https://arxiv.org/abs/2310.07276)\", by Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. BioT5 achieves superior performance on various biological tasks.\n* *ACL 2024 (Findings)* paper \"[BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning](https://arxiv.org/abs/2402.17810)\", by Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan. BioT5+ is pre-trained and fine-tuned with a large number of experiments, including **3 types of problems (classification, regression, generation), 15 kinds of tasks, and 21 total benchmark datasets**, demonstrating the remarkable performance and state-of-the-art results in most cases.\n* If you have questions, don't hesitate to open an issue or ask me via \u003cqizhipei@ruc.edu.cn\u003e or Lijun Wu via \u003clijun_wu@outlook.com\u003e. We are happy to hear from you!\n\n**↓Overview of BioT5**\n![](./imgs/overview_biot5.png)\n\n**↓Overview of BioT5+**\n![](./imgs/overview_biot5+.png)\n\n**Please refer to the `biot5` or `biot5_plus` folder for detailed instructions.**\n\n## Citations\n### BioT5\n```\n@inproceedings{pei2023biot5,\n  title={BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations},\n  author={Pei, Qizhi and Zhang, Wei and Zhu, Jinhua and Wu, Kehan and Gao, Kaiyuan and Wu, Lijun and Xia, Yingce and Yan, Rui},\n  booktitle = \"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing\",\n  month = dec,\n  year = \"2023\",\n  publisher = \"Association for Computational Linguistics\",\n  url = \"https://aclanthology.org/2023.emnlp-main.70\",\n  pages = \"1102--1123\"\n}\n```\n### BioT5+\n```\n@article{pei2024biot5+,\n  title={BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning},\n  author={Pei, Qizhi and Wu, Lijun and Gao, Kaiyuan and Liang, Xiaozhuan and Fang, Yin and Zhu, Jinhua and Xie, Shufang and Qin, Tao and Yan, Rui},\n  journal={arXiv preprint arXiv:2402.17810},\n  year={2024}\n}\n```\n\n## Acknowledegments\nThe code is based on [nanoT5](https://github.com/PiotrNawrot/nanoT5).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqizhipei%2Fbiot5","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqizhipei%2Fbiot5","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqizhipei%2Fbiot5/lists"}