https://github.com/opensparsellms/linearization
https://github.com/opensparsellms/linearization
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/opensparsellms/linearization
- Owner: OpenSparseLLMs
- License: apache-2.0
- Created: 2025-02-06T03:39:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-20T13:36:05.000Z (about 1 year ago)
- Last Synced: 2025-03-31T06:41:19.162Z (about 1 year ago)
- Language: Python
- Size: 475 KB
- Stars: 37
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Liger: Linearizing Large Language Models to Gated Recurrent Structures
[](https://arxiv.org/abs/2503.01496)
[](https://huggingface.co/collections/linear-moe-hub/liger-67d904bffd7f9b77ade7747d)
## Overview
Figure 1: Liger Linearization Framework
## Environment
```bash
git clone --recurse-submodules https://github.com/OpenSparseLLMs/Linearization.git
conda create -n liger python=3.10
conda activate liger
pip install -r requirements
pip install flash-attn --no-build-isolation
cd third_party/flash-linear-attention
pip install -e .
```
## Linearization
1. Copy your base model weights (e.g. Qwen3-8B) to `./checkpoints/`, and renamed as `liger_qwen3_gla_base`;
2. Modify the `config.json` under `liger_qwen3_gla_base` with new linearized `"architectures"` and `"model_type"`;
3. Modify the linearization settings under `configs` (e.g. liger_qwen3_gla.yaml);
4. Run the linearization script: `sh scripts/train_liger.sh`
## Evaluation
You need to install [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for evaluation:
```
cd third_party/lm-evaluation-harness
pip install -e .
```
```bash
python -m eval.harness --model hf \
--model_args pretrained=/your/Liger/checkpoints/liger_base_model, peft=/your/Liger/checkpoints/lora_adapter_path \
--tasks piqa,arc_easy,arc_challenge,hellaswag,winogrande \
--batch_size 64 \
--device cuda \
--seed 0
```
## Acknowledgements
We use the triton-implemented linear attention kernels from [fla-org/flash-linear-attention](https://github.com/fla-org/flash-linear-attention). We refer to [HazyResearch/lolcats](https://github.com/HazyResearch/lolcats) to construct our linearization training processs. The evaluation is supported by [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). Sincerely thank their contributions!
## Citation
If you find this repo useful, please cite and star our work:
```bibtex
@article{lan2025liger,
title={Liger: Linearizing Large Language Models to Gated Recurrent Structures},
author={Lan, Disen and Sun, Weigao and Hu, Jiaxi and Du, Jusen and Cheng, Yu},
journal={arXiv preprint arXiv:2503.01496},
year={2025}
}
```