Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models
https://github.com/horseee/Awesome-Efficient-LLM

  • Publish
  • LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning
  • ![Star - Matching Distillation of Large Language Models](https://arxiv.org/abs/2406.02959) <br> Chen Jia |<img width="1002" alt="image" src="https://arxiv.org/html/2406.02959v1/x1.png"> |[Github](https://github.com/jiachenwestlake/MMKD) <br> [Paper](https://arxiv.org/abs/2406.02959)|[//]: #06/11
  • ![Star
  • ![Star - Zero)[![Publish](https://img.shields.io/badge/Conference-ICML'24-blue)]()<br>[Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models](https://arxiv.org/abs/2406.02924) <br> Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu |<img width="1002" alt="image" src="https://raw.githubusercontent.com/pprp/Pruner-Zero/main/.github/images/pruner-zero-main-figure.png"> |[Github](https://github.com/pprp/Pruner-Zero) <br> [Paper](https://arxiv.org/abs/2406.02924)|[//]: #06/11
  • Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters - Mixtral) |[//]: #06/11
  • VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
  • ![Type
  • Large Language Model Pruning - Jia Song, Hsing-Kuo Pao |<img width="1002" alt="image" src="https://arxiv.org/html/2406.00030v1/x1.png"> |[Paper](https://arxiv.org/abs/2406.00030)|[//]: #06/05
  • FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
  • ![Star - Mozaffari/slope)<br>[SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs](https://arxiv.org/abs/2405.16325) <br> Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang, Maryam Mehri Dehnavi |<img width="1002" alt="image" src="https://arxiv.org/html/2405.16325v1/x1.png"> |[Github](https://github.com/Mohammad-Mozaffari/slope) <br> [Paper](https://arxiv.org/abs/2405.16325)| [//]: #05/29
  • ![Star - Lance/SPP)<br>[SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models](https://arxiv.org/abs/2405.16057) <br> Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li |<img width="1002" alt="image" src="https://github.com/Lucky-Lance/SPP/raw/main/asserts/SPP.png"> |[Github](https://github.com/Lucky-Lance/SPP) <br> [Paper](https://arxiv.org/abs/2405.16057)| [//]: #05/29
  • ![Star - EIC/ShiftAddLLM)<br>[ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization](https://arxiv.org/abs/2406.05981) <br> Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Lin |<img width="1002" alt="image" src="https://github.com/GATECH-EIC/ShiftAddLLM/raw/main/assets/overview.jpg"> |[Github](https://github.com/GATECH-EIC/ShiftAddLLM) <br> [Paper](https://arxiv.org/abs/2406.05981)|[//]: #06/11
  • Low-Rank Quantization-Aware Training for LLMs
  • LCQ: Low-Rank Codebook based Quantization for Large Language Models - Pu Cai, Wu-Jun Li |<img width="1002" alt="image" src="https://arxiv.org/html/2405.20973v1/x5.png"> |[Paper](https://arxiv.org/abs/2405.20973)|[//]: #06/05
  • MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
  • Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs - Holder | |[Paper](https://arxiv.org/abs/2405.20835)|[//]: #06/05
  • ![Star
  • I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models - LLM.png"> |[Paper](https://arxiv.org/abs/2405.17849)| [//]: #05/29
  • ![Star - sri/llm-quantization-attack)<br>[Exploiting LLM Quantization](https://arxiv.org/abs/2405.18137) <br> Kazuki Egashira, Mark Vero, Robin Staab, Jingxuan He, Martin Vechev |<img width="1002" alt="image" src="figures/exploiting_llm_quantization.png"> |[Github](https://github.com/eth-sri/llm-quantization-attack) <br> [Paper](https://arxiv.org/abs/2405.18137)| [//]: #05/29
  • CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
  • SpinQuant -- LLM quantization with learned rotations
  • ![Star - 778/SliM-LLM)<br>[SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models](https://arxiv.org/abs/2405.14917) <br> Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi |<img width="1002" alt="image" src="https://github.com/Aaronhuang-778/SliM-LLM/blob/main/imgs/[email protected]"> |[Github](https://github.com/Aaronhuang-778/SliM-LLM) <br> [Paper](https://arxiv.org/abs/2405.14917)| [//]: #05/29
  • ![Star - tuning)<br>[PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression](https://arxiv.org/abs/2405.14852) <br> Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik |<img width="1002" alt="image" src="figures/pv-tuning.png"> |[Github](https://github.com/Vahe1994/AQLM/tree/pv-tuning) <br> [Paper](https://arxiv.org/abs/2405.14852)| [//]: #05/29
  • Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
  • ![Star - EIC/Linearized-LLM)[![Publish](https://img.shields.io/badge/Conference-ICML'24-blue)]()<br>[When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models](https://arxiv.org/abs/2406.07368) <br> Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan (Celine)Lin |<img width="1002" alt="image" src="https://arxiv.org/html/2406.07368v1/x5.png"> |[Github](https://github.com/GATECH-EIC/Linearized-LLM) <br> [Paper](https://arxiv.org/abs/2406.07368)|[//]: #06/12
  • ![Star - research/Q-LLM)<br>[QuickLLaMA: Query-aware Inference Acceleration for Large Language Models](https://arxiv.org/abs/2406.07528) <br> Jingyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia |<img width="1002" alt="image" src="https://github.com/dvlab-research/Q-LLM/raw/master/img/framework.png"> |[Github](https://github.com/dvlab-research/Q-LLM) <br> [Paper](https://arxiv.org/abs/2406.07528)|[//]: #06/12
  • ![Publish - exiting for Faster LLM Inference with Thompson Sampling Control Mechanism](https://arxiv.org/abs/2406.03853) <br> Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai |<img width="1002" alt="image" src="https://arxiv.org/html/2406.03853v1/x3.png"> |[Paper](https://arxiv.org/abs/2406.03853)|[//]: #06/12
  • Faster Cascades via Speculative Decoding
  • ![Star - prompt-decoding)<br>[Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference](https://arxiv.org/abs/2405.18628) <br> Hao (Mark)Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan |<img width="1002" alt="image" src="https://github.com/hmarkc/parallel-prompt-decoding/raw/main/assets/Overview.png"> |[Github](https://github.com/hmarkc/parallel-prompt-decoding) <br> [Paper](https://arxiv.org/abs/2405.18628)| [//]: #05/31
  • Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
  • ![Publish - Data Experts for Large-Scale Sparse Models](https://arxiv.org/abs/2405.18832) <br> Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim |<img width="1002" alt="image" src="https://arxiv.org/html/2405.18832v1/x4.png"> |[Paper](https://arxiv.org/abs/2405.18832)| [//]: #05/31
  • ![Star - lab/DynMoE)<br>[Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models](https://arxiv.org/abs/2405.14297) <br> Yongxin Guo, Zhenglin Cheng, Xiaoying Tang, Tao Lin |<img width="1002" alt="image" src="figures/dynmoe.png"> |[Github](https://github.com/LINs-lab/DynMoE) <br> [Paper](https://arxiv.org/abs/2405.14297)| [//]: #05/29
  • ![Publish - tuned Sparse Mixture-of-Experts](https://arxiv.org/pdf/2405.16646) <br> Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers |<img width="1002" alt="image" src="https://arxiv.org/html/2405.16646v2/extracted/5626402/Fig/token_expert_combined_2.png"> |[Paper](https://arxiv.org/pdf/2405.16646)| [//]: #05/29
  • ![Star - transformer)<br>[Block Transformer: Global-to-Local Language Modeling for Fast Inference](https://arxiv.org/abs/2406.02657) <br> Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik Jo, Yireun Kim, Tal Schuster, Adam Fisch, James Thorne, Se-Young Yun |<img width="1002" alt="image" src="https://arxiv.org/html/2406.02657v1/x1.png"> |[Github](https://github.com/itsnamgyu/block-transformer) <br> [Paper](https://arxiv.org/abs/2406.02657)|[//]: #06/12
  • Loki: Low-Rank Keys for Efficient Sparse Attention
  • ![Star - Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead](https://arxiv.org/abs/2406.03482) <br> Amir Zandieh, Majid Daliri, Insu Han |<img width="1002" alt="image" src="figures/QJL.png"> |[Github](https://github.com/amirzandieh/QJL) <br> [Paper](https://arxiv.org/abs/2406.03482)|[//]: #06/11
  • ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
  • PowerInfer-2: Fast Large Language Model Inference on a Smartphone
  • ![Publish - based Applications with Semantic Variable](https://arxiv.org/abs/2405.19888) <br> Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, Lili Qiu |<img width="1002" alt="image" src="figures/parrot.png"> |[Paper](https://arxiv.org/abs/2405.19888)| [//]: #05/31
  • ![Star - PEFT)[![Publish](https://img.shields.io/badge/Conference-ACL'24%20Findings-blue)]()<br>[Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning](https://arxiv.org/abs/2406.03792) <br> Naibin Gu, Peng Fu, Xiyu Liu, Bowen Shen, Zheng Lin, Weiping Wang |<img width="1002" alt="image" src="https://arxiv.org/html/2406.03792v1/x5.png"> |[Github](https://github.com/gccnlp/Light-PEFT) <br> [Paper](https://arxiv.org/abs/2406.03792)|[//]: #06/12
  • Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity