{"id":13752379,"url":"https://github.com/blazerye/DrugAssist","last_synced_at":"2025-05-09T19:32:01.114Z","repository":{"id":218467468,"uuid":"736182169","full_name":"blazerye/DrugAssist","owner":"blazerye","description":"[Briefings In Bioinformatics] DrugAssist: A Large Language Model for Molecule Optimization","archived":false,"fork":false,"pushed_at":"2025-04-01T08:24:19.000Z","size":7373,"stargazers_count":126,"open_issues_count":0,"forks_count":13,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-01T09:28:36.264Z","etag":null,"topics":["ai-for-science","drug-discovery","instruction-datasets","instruction-tuning","large-language-models","molecule-generation","molecule-optimization"],"latest_commit_sha":null,"homepage":"https://academic.oup.com/bib/article/26/1/bbae693/7942355","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blazerye.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-27T07:48:51.000Z","updated_at":"2025-04-01T08:24:22.000Z","dependencies_parsed_at":"2024-11-16T05:30:39.472Z","dependency_job_id":"058f7f3b-7694-42e2-a004-679fe057312f","html_url":"https://github.com/blazerye/DrugAssist","commit_stats":null,"previous_names":["blazerye/drugassist"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blazerye%2FDrugAssist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blazerye%2FDrugAssist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blazerye%2FDrugAssist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blazerye%2FDrugAssist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blazerye","download_url":"https://codeload.github.com/blazerye/DrugAssist/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253312299,"owners_count":21888616,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-for-science","drug-discovery","instruction-datasets","instruction-tuning","large-language-models","molecule-generation","molecule-optimization"],"created_at":"2024-08-03T09:01:04.890Z","updated_at":"2025-05-09T19:32:01.102Z","avatar_url":"https://github.com/blazerye.png","language":"Python","funding_links":[],"categories":["Datasets \u0026 Benchmarks","🔬 Domain-Specific Applications","Ranked by starred repositories"],"sub_categories":["Text + BioMulti","🧬 Biology \u0026 Medicine"],"readme":"\u003ch1 align=\"center\"\u003e 🐹 DrugAssist  \u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e A Large Language Model for Molecule Optimization \u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  📃 \u003ca href=\"https://academic.oup.com/bib/article/26/1/bbae693/7942355\" target=\"_blank\"\u003ePaper\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/datasets/blazerye/MolOpt-Instructions\" target=\"_blank\"\u003eDataset\u003c/a\u003e • 🤗 \u003ca href=\"https://huggingface.co/blazerye/DrugAssist-7B\" target=\"_blank\"\u003eModel\u003c/a\u003e\u003cbr\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"fig/logo.png\" width=\"200\"\u003e\n\u003c/div\u003e\n\n## 📌 Contents\n- [Install](#install)\n- [Dataset](#dataset)\n- [Train](#train)\n- [Demo](#demo)\n- [About](#about)\n\n## 🛠️ Install\n1. Clone this repository and navigate to DrugAssist folder\n```bash\ngit clone https://github.com/blazerye/DrugAssist.git\ncd DrugAssist\n```\n\n2. Install Package\n```Shell\nconda create -n drugassist python=3.8 -y\nconda activate drugassist\npip install -r requirements.txt\n```\n\n## 🤗 Dataset\nWe release the dataset on Hugging Face at [blazerye/MolOpt-Instructions](https://huggingface.co/datasets/blazerye/MolOpt-Instructions), and you can use it for training.\n\n## 🚆 Train\nYou can use LoRA to finetune `Llama2-7B-Chat` model on the `MolOpt-Instructions` dataset, the running command is as follows:\n```Shell\nsh run_sft_lora.sh\n```\n\n## 👀 Demo\n#### Step 1: Merge model weights\nYou can merge LoRA weights to generate full model weights using the following command:\n```Shell\npython merge_model.py \\\n    --base_model $BASE_MODEL_PATH \\\n    --lora_model $LORA_MODEL_PATH \\\n    --output_dir $OUTPUT_DIR \\\n    --output_type huggingface \\\n    --verbose\n```\nAlternatively, you can download our DrugAssist model weights from [blazerye/DrugAssist-7B](https://huggingface.co/blazerye/DrugAssist-7B).\n\n#### Step 2: Launch web demo\nYou can use gradio to launch web demo by running the following command:\n```Shell\npython gradio_service.py \\\n    --base_model $FULL_MODEL_PATH \\\n    --ip $IP \\\n    --port $PORT\n```\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"fig/demo.png\" width=\"500\"\u003e\n\u003c/div\u003e\n\n#### Deploy the Quantized Model and Use Text-Generation-WebUI For Inference\nIn order to deploy DrugAssist model on devices with lower hardware configurations (such as personal laptops without GPUs), we used [llama.cpp](https://github.com/ggerganov/llama.cpp) to perform 4-bit quantization on the [DrugAssist-7B](https://huggingface.co/blazerye/DrugAssist-7B) model, resulting in the [DrugAssist-7B-4bit](https://huggingface.co/blazerye/DrugAssist-7B/blob/main/DrugAssist-7B-4bit.gguf) model. You can use the text-generation-webui tool to load and use this quantized model. For specific methods, please refer to the [quantized_model_deploy.md](./quantized_model_deploy.md).\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"fig/webui.png\" width=\"700\"\u003e\n\u003c/div\u003e\n\n## ⚖️ Evaluate\nAfter deploying the [DrugAssist-7B](https://huggingface.co/blazerye/DrugAssist-7B) model, you can refer to the [evaluate.md](./evaluate/evaluate.md) document and run the evaluate script to verify the molecular optimization results.\n\n## 📝 About\n### Citation\nIf you find DrugAssist useful for your research and applications, please cite using this BibTeX:\n```bibtex\n@article{ye2025drugassist,\n  title={DrugAssist: A large language model for molecule optimization},\n  author={Ye, Geyan and Cai, Xibao and Lai, Houtim and Wang, Xing and Huang, Junhong and Wang, Longyue and Liu, Wei and Zeng, Xiangxiang},\n  journal={Briefings in Bioinformatics},\n  volume={26},\n  number={1},\n  pages={bbae693},\n  year={2025},\n  publisher={Oxford University Press}\n}\n```\n### Acknowledgements\nWe appreciate [LLaMA](https://github.com/facebookresearch/llama), [Chinese-LLaMA-Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2), [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html), [iDrug](https://drug.ai.tencent.com) and many other related works for their open-source contributions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblazerye%2FDrugAssist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblazerye%2FDrugAssist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblazerye%2FDrugAssist/lists"}