An open API service indexing awesome lists of open source software.

Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc. 🎉🎉
https://github.com/xlite-dev/Awesome-LLM-Inference

Last synced: 4 days ago
JSON representation

Programming Languages
Sub Categories
📖KV Cache Scheduling/Quantize/Dropping ([©️back👆🏻](#paperlist)) 69 📖Weight/Activation Quantize/Compress ([©️back👆🏻](#paperlist)) 50 📖IO/FLOPs-Aware/Sparse Attention ([©️back👆🏻](#paperlist)) 40 📖Long Context Attention/KV Cache Optimization ([©️back👆🏻](#paperlist)) 38 📖Parallel Decoding/Sampling ([©️back👆🏻](#paperlist)) 30 📖LLM Algorithmic/Eval Survey ([©️back👆🏻](#paperlist)) 23 📖LLM Train/Inference Framework/Design ([©️back👆🏻](#paperlist)) 19 📖GEMM/Tensor Cores/MMA/Parallel ([©️back👆🏻](#paperlist)) 18 📖Continuous/In-flight Batching ([©️back👆🏻](#paperlist)) 14 📖Early-Exit/Intermediate Layer Decoding ([©️back👆🏻](#paperlist)) 14 📖DeepSeek/Multi-head Latent Attention(MLA) ([©️back👆🏻](#paperlist)) 13 📖Prompt/Context/KV Compression ([©️back👆🏻](#paperlist)) 12 📖CPU/Single GPU/FPGA/NPU/Mobile Inference ([©️back👆🏻](#paperlist)) 11 📖DP/MP/PP/TP/SP/CP Parallelism ([©️back👆🏻](#paperlist)) 9 📖Structured Prune/KD/Weight Sparse ([©️back👆🏻](#paperlist)) 9 📖Mixture-of-Experts(MoE) LLM Inference ([©️back👆🏻](#paperlist)) 9 📖Non Transformer Architecture ([©️back👆🏻](#paperlist)) 8 📖LLM Train/Inference Framework ([©️back👆🏻](#paperlist)) 7 📖GEMM/Tensor Cores/WMMA/Parallel ([©️back👆🏻](#paperlist)) 6 📖CPU/Single GPU/FPGA/Mobile Inference ([©️back👆🏻](#paperlist)) 5 📖VLM/Position Embed/Others ([©️back👆🏻](#paperlist)) 5 📖Disaggregating Prefill and Decoding ([©️back👆🏻](#paperlist)) 4 📖Prompt/Context Compression ([©️back👆🏻](#paperlist)) 4 📖Trending LLM/VLM Topics ([©️back👆🏻](#paperlist)) 3 📖Position Embed/Others ([©️back👆🏻](#paperlist)) 2