https://github.com/fasterdecoding/snapkv
https://github.com/fasterdecoding/snapkv
long-context-modeling
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/fasterdecoding/snapkv
- Owner: FasterDecoding
- Created: 2024-03-23T05:36:18.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-01T00:32:13.000Z (almost 2 years ago)
- Last Synced: 2025-07-09T13:56:10.159Z (9 months ago)
- Topics: long-context-modeling
- Language: Python
- Homepage:
- Size: 851 KB
- Stars: 261
- Watchers: 6
- Forks: 19
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SnapKV :camera:
We introduce an innovative and out-of-box KV cache compression method, [SnapKV](https://arxiv.org/abs/2404.14469).
## Requirements
Currently tested with `transformers==4.37.0`, need to check if it is compatible with higher version.
```
transformers>=4.36
flash-attn==2.4.0
```
## Installation
```
git clone git@github.com:FasterDecoding/SnapKV.git
cd SnapKV
pip install -e .
```
## Quick Start
### Use SnapKV-optimized Models
For example:
```python
from snapkv.monkeypatch.monkeypatch import replace_mistral
replace_mistral() # Use monkey patches enable SnapKV
```
Check [the example notebook](./notebooks/example.ipynb).
### Customize Your SnapKV-optimized Models
SnapKV can be easily integrated with other models.
You can follow the comment marked with `[SnapKV]` in [existing models](./snapkv/monkeypatch/monkeypatch.py) to construct your own models. (Currently we support [Llama family](./snapkv/monkeypatch/llama_hijack_4_37.py)/ [Mistral](./snapkv/monkeypatch//mistral_hijack_4_37.py)/ [Mixtral](./snapkv/monkeypatch//mixtral_hijack_4_37.py))
The detailed algorithm of SnapKV is in [`snapkv_utils.py`](./snapkv/monkeypatch/snapkv_utils.py)
## Partial Results


## TODO
- [ ] Add observation experiments for reduplication.
- [ ] Add LongBench for reduplication.
- [ ] Explore the prompt phase compression.
## Citation
If you feel this project is helpful, please consider cite our report :blush:
```
@article{li2024snapkv,
title={SnapKV: LLM Knows What You are Looking for Before Generation},
author={Li, Yuhong and Huang, Yingbing and Yang, Bowen and Venkitesh, Bharat and Locatelli, Acyr and Ye, Hanchen and Cai, Tianle and Lewis, Patrick and Chen, Deming},
journal={arXiv preprint arXiv:2404.14469},
year={2024}
}
```