Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
https://github.com/DefTruth/Awesome-LLM-Inference

Last synced: 3 days ago
JSON representation

Programming Languages
Sub Categories
📖KV Cache Scheduling/Quantize/Dropping ([©️back👆🏻](#paperlist)) 65 📖Weight/Activation Quantize/Compress ([©️back👆🏻](#paperlist)) 50 📖Long Context Attention/KV Cache Optimization ([©️back👆🏻](#paperlist)) 38 📖IO/FLOPs-Aware/Sparse Attention ([©️back👆🏻](#paperlist)) 32 📖Parallel Decoding/Sampling ([©️back👆🏻](#paperlist)) 30 📖LLM Algorithmic/Eval Survey ([©️back👆🏻](#paperlist)) 23 📖LLM Train/Inference Framework/Design ([©️back👆🏻](#paperlist)) 18 📖GEMM/Tensor Cores/MMA/Parallel ([©️back👆🏻](#paperlist)) 16 📖Early-Exit/Intermediate Layer Decoding ([©️back👆🏻](#paperlist)) 14 📖Prompt/Context/KV Compression ([©️back👆🏻](#paperlist)) 12 📖Continuous/In-flight Batching ([©️back👆🏻](#paperlist)) 12 📖CPU/Single GPU/FPGA/NPU/Mobile Inference ([©️back👆🏻](#paperlist)) 10 📖Structured Prune/KD/Weight Sparse ([©️back👆🏻](#paperlist)) 9 📖Mixture-of-Experts(MoE) LLM Inference ([©️back👆🏻](#paperlist)) 9 📖Non Transformer Architecture ([©️back👆🏻](#paperlist)) 8 📖LLM Train/Inference Framework ([©️back👆🏻](#paperlist)) 7 📖GEMM/Tensor Cores/WMMA/Parallel ([©️back👆🏻](#paperlist)) 6 📖CPU/Single GPU/FPGA/Mobile Inference ([©️back👆🏻](#paperlist)) 5 📖Data/Model/Pipeline/Tensor/Sequence/Context Parallelism ([©️back👆🏻](#paperlist)) 5 📖VLM/Position Embed/Others ([©️back👆🏻](#paperlist)) 4 📖Prompt/Context Compression ([©️back👆🏻](#paperlist)) 4 📖Position Embed/Others ([©️back👆🏻](#paperlist)) 2 📖Trending LLM/VLM Topics ([©️back👆🏻](#paperlist)) 2