https://github.com/yangjianxin1/LongQLoRA

LongQLoRA: Extent Context Length of LLMs Efficiently
https://github.com/yangjianxin1/LongQLoRA

llm long-context longlora lora qlora

Last synced: 7 days ago
JSON representation

LongQLoRA: Extent Context Length of LLMs Efficiently

Host: GitHub
URL: https://github.com/yangjianxin1/LongQLoRA
Owner: yangjianxin1
Created: 2023-10-22T08:49:25.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2023-11-12T04:15:10.000Z (almost 2 years ago)
Last Synced: 2025-02-01T04:51:10.846Z (9 months ago)
Topics: llm, long-context, longlora, lora, qlora
Language: Python
Homepage:
Size: 1.92 MB
Stars: 162
Watchers: 2
Forks: 15
Open Issues: 4
Metadata Files:
- Readme: README.MD

Awesome Lists containing this project

StarryDivineSky - yangjianxin1/LongQLoRA - 7B-8K。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

# LongQLoRA: Efficient and Effective Method to Extend Context Length of LLMs

## Technical Report

Technical Report: [LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models](https://arxiv.org/abs/2311.04879)

## Introduction
LongQLoRA is a memory-efficient and effective method to extend context length of Large Language Models with less training GPUs.
**On a single 32GB V100 GPU**, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k.
LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile dataset after only 1000 finetuning steps, our model outperforms LongLoRA and is very close to MPT-7B-8K.

Evaluation perplexity on PG19 validation and Proof-pile test datasets in evaluation context length of 8192:

| Model | PG19 | Proof-pile |
|---------------------|----------|------------|
| LLaMA2-7B | \>1000 | \>1000 |
| MPT-7B-8K | 7.98 | 2.67 |
| LongLoRA-LoRA-7B-8K | 8.20 | 2.78 |
| LongLoRA-Full-7B-8K | 7.93 | 2.73 |
| **LongQLoRA-7B-8K** | **7.96** | **2.73** |

Evaluation perplexity of 7B models on PG19 validation and Proof-pile test datasets in evaluation context
length from 1024 to 8192:

## Dataset
We sample about 54k long text from Redpajama dataset to finetune pretrained models, whose token lengths ranging from 4096 to 32768.

We also build a long context instruction dataset for supervised finetuning chat models. This dataset contains
39k instruction data, mainly including book summarization, Natural Questions, subset of LongQA and Evol-Instruct of WizardLM.
In order to adapt to the target length of 8192, the max token number of each data is 8192. The distribution is as follows.

| Dataset | Description |
|---------------------------------------------------------------------------------------------|---------------------------------------------------------|
| 🤗[LongQLoRA-Pretrain-Data-54k](https://huggingface.co/datasets/YeungNLP/LongQLoRA-Dataset) | Include 54212 data, used to finetune pretrained model |
| 🤗[LongQLoRA-SFT-Data-39k](https://huggingface.co/datasets/YeungNLP/LongQLoRA-Dataset) | Include 38821 data, used to finetune chat model |

## Model

| Model | Context Length | Description |
|-----------------------------------------------------------------------------------------------|-----------------|-------------------------------------------------------------------------------|
| 🤗[LongQLoRA-Llama2-7b-8k](https://huggingface.co/YeungNLP/LongQLoRA-Llama2-7b-8k) | 8192 | Finetuned with LongQLoRA-Pretrain-Data-54k for 1k steps based on LLaMA2-7B |
| 🤗[LongQLoRA-Vicuna-13b-8k](https://huggingface.co/YeungNLP/LongQLoRA-Vicuna-13b-8k) | 8192 | Finetuned with LongQLoRA-SFT-Data-39k for 1.7k steps based on Vicuna-13B-V1.5 |
| 🤗[LongQLoRA-Llama2-7b-8k-lora](https://huggingface.co/YeungNLP/LongQLoRA-Llama2-7b-8k-lora) | 8192 | LoRA weights |
| 🤗[LongQLoRA-Vicuna-13b-8k-lora](https://huggingface.co/YeungNLP/LongQLoRA-Vicuna-13b-8k) | 8192 | LoRA weights |

## Training
The training configs are saved in the train_args directory, some parameters are as follows:
- `sft`: Do sft task if set as True, otherwise do pretraining task.
- `model_max_length`: The target context length.
- `max_seq_length`: The max sequence length in training, should be less than or equal to model_max_length
- `logging_steps`: Log training loss every n steps.
- `save_steps`: Save model every n steps.
- `lora_rank`: The LoRA rank in training.

Extend context length of pretrained model LLaMA2-7B:
```bash
deepspeed train.py --train_args_file ./train_args/llama2-7b-pretrain.yaml
```

Extend context length of chat Model Vicuna-13B:
```bash
deepspeed train.py --train_args_file ./train_args/vicuna-13b-sft.yaml
```

## Inference
You can merge the lora weight to base model:
```bash
cd script
python merge_lora.py
```

Inference with pretrained model:
```bash
cd script/inference
python inference.py
```

Chat with chat model:
```bash
cd script/inference
python chat.py
```

## Evaluation
Download the evaluation dataset tokenized by LLaMA2 by [LongLoRA](https://github.com/dvlab-research/LongLoRA).

| Dataset |
|-------------------------------------------------------------------------------------|
| 🤗[PG19-validation.bin](https://huggingface.co/datasets/YeungNLP/LongQLoRA-Dataset) |
| 🤗[PG19-test.bin](https://huggingface.co/datasets/YeungNLP/LongQLoRA-Dataset) |
| 🤗[Proof-pile-test.bin](https://huggingface.co/datasets/YeungNLP/LongQLoRA-Dataset) |

Evaluate the perplexity of models. You can set `load_in_4bit` as True to save memory:
```bash
cd script/evaluate
python evaluate.py \
--batch_size 1 \
--base_model YeungNLP/LongQLoRA-Llama2-7b-8k \
--seq_len 8192 \
--context_size 8192 \
--sliding_window 8192 \
--data_path pg19-validation.bin
```

Evaluate the perplexity of models with LoRA weights:
```bash
cd script/evaluate
python evaluate.py \
--batch_size 1 \
--base_model YeungNLP/LongQLoRA-Llama2-7b-8k \
--peft_model LongQLoRA-Llama2-7b-8k-lora\
--seq_len 8192 \
--context_size 8192 \
--sliding_window 8192 \
--data_path pg19-validation.bin
```

## Examples

The examples generated by [LongQLoRA-Vicuna-13b-8k](https://huggingface.co/YeungNLP/LongQLoRA-Vicuna-13b-8k) ars as follows.

Examples of long context generartion, the input context lengths are between 4096 and 8192 which are larger than original context length of LLaMA2.

Examples of short context generartion, model keep the performance of short instruction following.

## Citation
```text
@misc{yang2023longqlora,
title={LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models},
author={Jianxin Yang},
year={2023},
eprint={2311.04879},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## Acknowledgement
- This work combines the advantages of [QLoRA](https://github.com/artidoro/qlora), [Position Interpolation](https://arxiv.org/abs/2306.15595) and [LongLoRA](https://github.com/dvlab-research/LongLoRA)
- The code of Shift Short Attention is implemented by [LongLoRA](https://github.com/dvlab-research/LongLoRA)
- The training code is modified upon [Firefly](https://github.com/yangjianxin1/Firefly)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yangjianxin1/LongQLoRA

Awesome Lists containing this project

README