An open API service indexing awesome lists of open source software.

https://github.com/SqueezeAILab/KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
https://github.com/SqueezeAILab/KVQuant

compression efficient-inference efficient-model large-language-models llama llm localllama localllm mistral model-compression natural-language-processing quantization small-models text-generation transformer

Last synced: 5 months ago
JSON representation

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Awesome Lists containing this project