https://github.com/SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
https://github.com/SqueezeAILab/KVQuant

compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer

Last synced: 5 months ago
JSON representation

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Host: GitHub
URL: https://github.com/SqueezeAILab/KVQuant
Owner: SqueezeAILab
Created: 2024-01-31T17:30:10.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-13T11:19:28.000Z (about 1 year ago)
Last Synced: 2025-04-30T02:04:45.176Z (5 months ago)
Topics: compression, efficient-inference, efficient-model, large-language-models, llama, llm, localllama, localllm, mistral, model-compression, natural-language-processing, quantization, small-models, text-generation, transformer
Language: Python
Homepage: https://arxiv.org/abs/2401.18079
Size: 19.8 MB
Stars: 342
Watchers: 10
Forks: 30
Open Issues: 15
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

StarryDivineSky - SqueezeAILab/KVQuant

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SqueezeAILab/KVQuant

Awesome Lists containing this project