Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
https://github.com/HuangOwen/Awesome-LLM-Compression
Last synced: 5 days ago
JSON representation
-
Papers
-
Pruning and Sparsity
- [Paper - PruMerge)
- [Paper
- [Paper
- [Paper
- [Paper - Pruner)
- [Paper - DASLab/ZipLM)
- [Paper
- [Paper
- [Paper - Group/essential_sparsity)
- [Paper
- [Paper
- [Paper
- [Paper - DASLab/sparsegpt)
- [Paper
- [Paper - science/llm-interpret)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - llm)
- [Paper
- [Paper - Group/Junk_DNA_Hypothesis)
- [Paper
- [Paper
- [Paper - nlp/LLM-Shearing)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Pruner)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - pruning)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Mozaffari/slim)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Aware-Automated-Machine-Learning/tree/main/Shears)
- [Paper
- [Paper
- [Paper - Pruner)
- [Paper
- [Paper - DASLab/EvoPress)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - han-lab/Quest)
- [Paper - Barber)
- [Paper
- [Paper - Aware-Tuning)
- [Paper
- [Paper
- [Paper
- [Paper - cybernetics/Relative-importance-and-activation-pruning)
- [Paper - AI-Lab/Sirius)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Aware-Automated-Machine-Learning/tree/main/LoNAS)
- [Paper - Aware-Automated-Machine-Learning/tree/main/SQFT)
- [Paper - Zero)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - PEFT)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - He/LLM-Drop)
- [Paper
- [Paper - lab/shadow_llm/)
- [Paper - nlp/Edge-Pruning)
- [Paper - research/EEP)
- [Paper
- [Paper
- [Paper
-
Distillation
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - nlp/LaMini-LM)
- [Paper
- [Paper
- [Paper - ai/gpt4all)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - kd)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - NeMo-Minitron-8B-Base)
- [Paper - Collaborative-Knowledge-Distillation)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - distillation)
- [Paper
- [Paper
- [Paper
- [Paper - neo-66e3c882f5579b829ff57eba)
-
Efficient Prompting
- [Paper - for-Prompt-Compression)
- [Paper - instruction-effectiveness)
- [Paper - prompting)
- [Paper - nlp/AutoCompressors)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - mllab/context-memory)
- [Paper
- [Paper
- [Paper
- [Paper - COCO)
- [Paper
- [Paper - Influx)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
Quantization
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - han-lab/smoothquant)
- [Paper
- [Paper - tYCaP0phY_&name=supplementary_material)
- [Paper
- [Paper - DASLab/gptq)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - han-lab/llm-awq)
- [Paper - QAT)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ai/INT-FP-QSim)
- [Paper - DASLab/QIGen)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - compressor)
- [Paper - lora)
- [Paper
- [Paper - LLM)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - compressor)
- [Paper - Transformers)
- [Paper - AMP)
- [Paper - DASLab/QUIK)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - AI-research/outlier-free-transformers)
- [Paper - extension-for-transformers)
- [Paper
- [Paper - 98/llm-mixed-q)
- [Paper
- [Paper - Watermark)
- [Paper
- [Paper - FP4)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - chee/QuIP)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - dmx/project-resq)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - 778/SliM-LLM)
- [Paper
- [Paper
- [Paper - easl/deltazip)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - llm)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - 778/BiLLM)
- [Paper - RelaxML/quip-sharp)
- [Paper
- [Paper
- [Paper
- [Paper - qlora)
- [Paper
- [Paper
- [Paper - DuDa/BitDistiller)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Lab/moe-quantization)
- [Paper
- [Paper
- [Paper - RelaxML/qtip)
- [Paper - han-lab/qserve)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Quantization) [[Model]](https://huggingface.co/LLMQ)
- [Paper
- [Paper - pretrain)
- [Paper
- [Paper
- [Paper - Point-RND/GIFT_SW-v2-Gaussian-noise-Injected-Fine-Tuning-of-Salient-Weights-for-LLMs)
- [Paper
- [Paper
- [Paper
- [Paper - DASLab/marlin) [[Code (Sparse Marlin)]](https://github.com/IST-DASLab/Sparse-Marlin)
- [Paper
- [Paper - fi/MobileQuant)
- [Paper
- [Paper
- [Paper - MLSys-Lab/SVD-LLM)
- [Paper - ai-research/gptvq)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Ouyang)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - compensation)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - LLM)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Computing-Lab-Yale/TesseraQ)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ml/SageAttention)
- [Paper
- [Paper
- [Paper - lab/MX-QLLM)
- [Paper
- [Paper - HPCA-25)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - EIC/ShiftAddLLM)
- [Paper
- [Paper
- [Paper - round)
- [Paper - EIC/Edge-LLM)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - MAC)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Group/Q-GaLore)
- [Paper - AILab/flash-attention)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lab/EfficientLLMs)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
-
Survey
- [Paper
- [Paper
- [Paper - MLSys-Lab/Efficient-LLMs-Survey)
- [Paper
- [Paper
- [Paper
- [Paper - LLMs-on-device) [[Download On-device LLMs]](https://nexaai.com/models)
- [Paper - Knowledge-Distillation-of-LLMs)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - LLM-Survey)
- [Paper
-
Other
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - Scheduling)
- [Paper
- [Paper
- [Paper
- [Paper - research/LongLoRA)
- [Paper
- [Paper
- [Paper - han-lab/streaming-llm)
- [Paper
- [Paper
- [Paper
- [Paper - research/Dataset_Quantization)
- [Paper - zha/Align)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - han-lab/duo-attention)
- [Paper - IPADS/PowerInfer)
- [Paper - Lab-UMD/Unified-MoE-Compression)
- [paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - lin/RapidIn)
- [Paper
- [Paper - AILab/flash-attention)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ai-lab/Consistency_LLM)
- [Paper - AI-Lab/TriForce)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - ml/SageAttention)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [paper
- [Paper
-
KV Cache Compression
- [Paper
- [Paper
- [Paper
- [Paper - ai/lexico)
- [Paper
- [Paper
- [Paper
- [Paper - ai/SCOPE)
- [Paper
- [Paper - KVCacheQuantization)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - yuan/KIVI)
- [Paper
- [Paper - sg/SimLayerKV)
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper
- [Paper - NACL)
- [Paper
-
-
Tools
-
Star History
-
Other
- ![Star History Chart - history.com/#HuangOwen/Awesome-LLM-Compression&Date)
-
Programming Languages
Categories
Sub Categories
Keywords
llama
8
quantization
8
llm
8
large-language-models
7
gpt
3
pytorch
3
pruning
3
transformer
2
c
2
cpp
2
language-model
2
deep-learning
2
post-training-quantization
2
ai
2
llamacpp
2
llama3
2
llama2
2
quantization-aware-training
2
lora
2
instruction-tuning
2
ggml
2
alpaca
2
transformers
2
chatglm
2
onnx
1
onnxruntime
1
rwkv
1
model-para
1
lama
1
lamacpp
1
python
1
tensorrt
1
chatgpt
1
cot
1
moss
1
p-tuning
1
parameter-efficient
1
tabul
1
tabular-data
1
tabular-model
1
amd
1
cuda
1
hpu
1
inference
1
inferentia
1
llm-serving
1
llmops
1
mlops
1
model-serving
1
rocm
1