Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/georgesung/llm_qlora
Fine-tuning LLMs using QLoRA
https://github.com/georgesung/llm_qlora
Last synced: 3 months ago
JSON representation
Fine-tuning LLMs using QLoRA
- Host: GitHub
- URL: https://github.com/georgesung/llm_qlora
- Owner: georgesung
- License: mit
- Created: 2023-07-03T01:51:45.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-08T13:11:13.000Z (7 months ago)
- Last Synced: 2024-07-31T20:32:18.012Z (6 months ago)
- Language: Jupyter Notebook
- Size: 38.1 KB
- Stars: 224
- Watchers: 4
- Forks: 50
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- Awesome-LLM-Productization - LLM QLoRA - Fine-tuning LLMs using QLoRA (Models and Tools / LLM Finetuning)
- awesome-LLMs-finetuning - llm_qlora - tuning LLMs using QLoRA. :star: 136 (4. Fine-Tuning / Frameworks)
- awesome-llms-fine-tuning - llm_qlora - tuning LLMs using QLoRA. :star: 136 (GitHub projects)
README
# Fine-tuning LLMs using QLoRA
## Setup
First, make sure you are using python 3.8+. If you're using python 3.7, see the Troubleshooting section below.`pip install -r requirements.txt`
## Run training
```
python train.py
```For exmaple, to fine-tune Llama3-8B on the wizard_vicuna_70k_unfiltered dataset, run
```
python train.py configs/llama3_8b_chat_uncensored.yaml
```## Push model to HuggingFace Hub
Follow instructions [here](https://huggingface.co/docs/hub/repositories-getting-started#terminal).## Models trained on HuggingFace Hub
| Model name | Config file | URL |
|----------|----------|----------|
| llama3_8b_chat_uncensored | configs/llama3_8b_chat_uncensored.yaml | https://huggingface.co/georgesung/llama3_8b_chat_uncensored |
| llama2_7b_openorca_35k | configs/llama2_7b_openorca_35k.yaml | https://huggingface.co/georgesung/llama2_7b_openorca_35k |
| llama2_7b_chat_uncensored | configs/llama2_7b_chat_uncensored.yaml | https://huggingface.co/georgesung/llama2_7b_chat_uncensored |
| open_llama_7b_qlora_uncensored | configs/open_llama_7b_qlora_uncensored.yaml | https://huggingface.co/georgesung/llama2_7b_openorca_35k |## Inference
Simple sanity check:
```
python inference.py
```For notebooks with example inference results, see `inference.ipynb` and this [Colab notebook](https://colab.research.google.com/drive/1CQbUROBZCuxfLa-QopodJDCSfqMLIlLI?usp=sharing).
## Blog post
Blog post describing the process of QLoRA fine tuning: https://georgesung.github.io/ai/qlora-ift/## Converting to GGUF and quantizing the model
Download and build [llama.cpp](https://github.com/ggerganov/llama.cpp), and follow the instructions on their README to convert the model to GGUF and quantize to desired specs.*Tip*: If llama.cpp gives an error saying the number of tokens is different between the model and tokenizer.json, it could be because we added a pad token (e.g. for training Llama). One work-around is to copy the original tokenizer.json from the base model (you can find the base model in huggingface cache at `~/.cache/huggingface/`) to the new model's location, but make sure to back-up your tokenizer.json!
*Tip*: Llama3 uses BPE tokenizer, make sure to specify `--vocab-type bpe` when converting to GGUF
## Troubleshooting
### Issues with python 3.7
If you're using python 3.7, you will install `transformers 4.30.x`, since `transformers >=4.31.0` [no longer supports python 3.7](https://github.com/huggingface/transformers/releases/tag/v4.31.0). If you then install the latest version of `peft`, the GPU memory consumption will be higher than usual. The work-around is to use an older version of `peft` to go along with the older `transformers` version you installed. Update your `requirements.txt` as follows:
```
transformers==4.30.2
git+https://github.com/huggingface/peft.git@86290e9660d24ef0d0cedcf57710da249dd1f2f4
```
Of course, make sure to remove the original lines with `transformers` and `peft`, and run `pip install -r requirements.txt`