Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/blazerye/DrugAssist

DrugAssist: A Large Language Model for Molecule Optimization
https://github.com/blazerye/DrugAssist

ai-for-science drug-discovery instruction-datasets instruction-tuning large-language-models molecule-generation molecule-optimization

Last synced: 3 months ago
JSON representation

DrugAssist: A Large Language Model for Molecule Optimization

Host: GitHub
URL: https://github.com/blazerye/DrugAssist
Owner: blazerye
Created: 2023-12-27T07:48:51.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-07-26T03:27:41.000Z (4 months ago)
Last Synced: 2024-07-27T04:01:59.767Z (4 months ago)
Topics: ai-for-science, drug-discovery, instruction-datasets, instruction-tuning, large-language-models, molecule-generation, molecule-optimization
Language: Python
Homepage: https://arxiv.org/abs/2401.10334
Size: 7.02 MB
Stars: 126
Watchers: 3
Forks: 10
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

top-life-sciences - **blazerye/DrugAssist** - for-science`, `drug-discovery`, `instruction-datasets`, `instruction-tuning`, `large-language-models`, `molecule-generation`, `molecule-optimization`<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 123 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 10 <img src='https://github.com/HubTou/topgh/blob/main/icons/watchers.png'> 3 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> Python | (Ranked by starred repositories)

README

        
 🐹 DrugAssist  

 A Large Language Model for Molecule Optimization 




  📃 Paper • 🤗 Dataset • 🤗 Model






  



## 📌 Contents

- [Install](#install)

- [Dataset](#dataset)

- [Train](#train)

- [Demo](#demo)

- [About](#about)

## 🛠️ Install

1. Clone this repository and navigate to DrugAssist folder

```bash

git clone https://github.com/blazerye/DrugAssist.git

cd DrugAssist

```

2. Install Package

```Shell

conda create -n drugassist python=3.8 -y

conda activate drugassist

pip install -r requirements.txt

```

## 🤗 Dataset

We release the dataset on Hugging Face at [blazerye/MolOpt-Instructions](https://huggingface.co/datasets/blazerye/MolOpt-Instructions), and you can use it for training.

## 🚆 Train

You can use LoRA to finetune `Llama2-7B-Chat` model on the `MolOpt-Instructions` dataset, the running command is as follows:

```Shell

sh run_sft_lora.sh

```

## 👀 Demo

#### Step 1: Merge model weights

You can merge LoRA weights to generate full model weights using the following command:

```Shell

python merge_model.py \

    --base_model $BASE_MODEL_PATH \

    --lora_model $LORA_MODEL_PATH \

    --output_dir $OUTPUT_DIR \

    --output_type huggingface \

    --verbose

```

Alternatively, you can download our DrugAssist model weights from [blazerye/DrugAssist-7B](https://huggingface.co/blazerye/DrugAssist-7B).

#### Step 2: Launch web demo

You can use gradio to launch web demo by running the following command:

```Shell

python gradio_service.py \

    --base_model $FULL_MODEL_PATH \

    --ip $IP \

    --port $PORT

```



  



#### Deploy the Quantized Model and Use Text-Generation-WebUI For Inference

In order to deploy DrugAssist model on devices with lower hardware configurations (such as personal laptops without GPUs), we used [llama.cpp](https://github.com/ggerganov/llama.cpp) to perform 4-bit quantization on the [DrugAssist-7B](https://huggingface.co/blazerye/DrugAssist-7B) model, resulting in the [DrugAssist-7B-4bit](https://huggingface.co/blazerye/DrugAssist-7B/blob/main/DrugAssist-7B-4bit.gguf) model. You can use the text-generation-webui tool to load and use this quantized model. For specific methods, please refer to the [quantized_model_deploy.md](./quantized_model_deploy.md).



  



## ⚖️ Evaluate

After deploying the [DrugAssist-7B](https://huggingface.co/blazerye/DrugAssist-7B) model, you can refer to the [evaluate.md](./evaluate/evaluate.md) document and run the evaluate script to verify the molecular optimization results.

## 📝 About

### Citation

If you find DrugAssist useful for your research and applications, please cite using this BibTeX:

```bibtex

@article{ye2023drugassist,

  title={DrugAssist: A Large Language Model for Molecule Optimization},

  author={Ye, Geyan and Cai, Xibao and Lai, Houtim and Wang, Xing and Huang, Junhong and Wang, Longyue and Liu, Wei and Zeng, Xiangxiang},

  journal={arXiv preprint arXiv:2401.10334},

  year={2023}

}

```

### Acknowledgements

We appreciate [LLaMA](https://github.com/facebookresearch/llama), [Chinese-LLaMA-Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2), [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html), [iDrug](https://drug.ai.tencent.com) and many other related works for their open-source contributions.