Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hanguo97/lq-lora
https://github.com/hanguo97/lq-lora
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/hanguo97/lq-lora
- Owner: HanGuo97
- License: mit
- Created: 2023-11-20T17:56:46.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-01-22T03:24:39.000Z (5 months ago)
- Last Synced: 2024-01-22T04:28:38.376Z (5 months ago)
- Language: Python
- Size: 91.8 KB
- Stars: 86
- Watchers: 4
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-llm-deployment - LQ-LoRA - rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning (Uncategorized / Uncategorized)
README
# LQ-LoRA: Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [[Paper](https://arxiv.org/abs/2311.12023)]
## Changelog
- 20240121: Deprecated the legacy dequantization, and included additional evaluation.
- 20231215: Uploaded artifacts.## Artifacts
- Model checkpoint (and training logs) for LLaMA-2 7B with LQ-LoRA (2.75-bits, 64-rank, Fisher) [[link]](https://huggingface.co/hanguo/lq-lora/tree/main/output_c4wiki_lpq_20231022_ranks64_7b_c4_2.75)
- Model checkpoint (and training logs) for LLaMA-2 70B with LQ-LoRA (2.75-bits, 64-rank, Fisher) [[link]](https://huggingface.co/hanguo/lq-lora/tree/main/output_c4wiki_lpq_20231022_ranks64_70b_c4_2.75)
- Pre-computed ILP data for LLaMA-2 7B [[link]](https://huggingface.co/hanguo/lq-lora/tree/main/ilp_data_fp32)
- Pre-computed ILP data for LLaMA-2 70B [[link]](https://huggingface.co/hanguo/lq-lora/tree/main/ilp_data_bf16)
- Fisher Information for LLaMA-2 7B [[link]](https://huggingface.co/hanguo/lq-lora/tree/main/fisher_dict_fp32)
- Fisher Information for LLaMA-2 70B -> file over the size limit, please contact us!## Installation
1. Clone the repo
```bash
git clone https://github.com/HanGuo97/lq-lora.git
cd lq-lora
```2. Create Docker image (optional)
```bash
# Using BuiltKit
DOCKER_BUILDKIT=1 docker build \
-t lqlora \
-f Dockerfile \
.docker run -ti --rm \
--gpus all \
-p 28888:8888 \
--shm-size=2g \
lqlora \
bash -c "cd main/ && jupyter-lab --ip=0.0.0.0 --allow-root"
```3. Install dependencies
```bash
bash scripts/setup.sh
```**Note**: Some of the codebase relies on PyTorch>=2.1.
## Usages
### Downloading Data for Quantization
After downloading the files, please update `FILE_NAMES_DICT` in `models/allocation_utils` accordingly.
### Applying Quantization
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from models import lora_utilsdata = "c4" # applying data-aware quantization
budget = "2.75" # target bits
model_size = "70b" # 7b or 70b# Loads the base model (to CPU)
model = AutoModelForCausalLM.from_pretrained(
f"meta-llama/Llama-2-{model_size}-hf")# Adds LoRA components, etc
model = lora_utils.prepare_model_for_lora(
model=model,
num_ranks=64,
lora_alpha=16,
lora_dropout=0.0,
use_gradient_checkpointing=True)# Applies LQ-LoRA to the model.
lora_utils.transform_lora_layers(
lpq=True,
model=model,
model_name=f"llama-2-{model_size}/lpq-64/{data},budget={budget}",
device="cuda")
```### Saving Quantized Models
Note that HuggingFace's PEFT library only saves the adapter parameters. Since LQ-LoRA additionally changes the base model parameters, we need to save the entire weights of the model.
```python
state_dict = model.state_dict()
file_name = os.path.join(
output_dir,
"full_model.pth")
torch.save(state_dict, file_name)
```### Loading Quantized Models
```python
# No need to apply `transform_lora_layers` because
# these will be loaded from the checkpoint.
model = lora_utils.prepare_model_for_lora(
model=model,
num_ranks=64,
lora_alpha=16,
lora_dropout=0.0,
use_gradient_checkpointing=True,
checkpoint_dir=checkpoint_dir) # -> enter the path to the checkpoint directory
```## Todos
- [X] Upload the artifacts
- [X] We use a legacy version of the (de)quantizaton implementation. We will update the code to use the latest version of the (de)quantization implementation.## Acknowledgement
This code reuses components from several libraries including [QLoRA](https://github.com/artidoro/qlora) and [OmniQuant](https://github.com/OpenGVLab/OmniQuant).