An open API service indexing awesome lists of open source software.

Awesome-Efficient-AIGC

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
https://github.com/Efficient-ML/Awesome-Efficient-AIGC

Last synced: 11 days ago
JSON representation

  • Language

    • 2019

    • 2023

      • [EMNLP - FP4: 4-Bit Floating-Point Quantized Transformers [[code](https://github.com/nbasyl/LLM-FP4)] ![GitHub Repo stars](https://img.shields.io/github/stars/nbasyl/LLM-FP4)
      • [ArXiv
      • [ICLR - DASLab/gptq)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/gptq)
      • [ArXiv - GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
      • [ICLR - Training Quantization for Generative Pre-trained Transformers [[code](https://github.com/IST-DASLab/gptq)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/gptq)
      • [ICML - Shot [[code](https://github.com/IST-DASLab/sparsegpt)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/sparsegpt)
      • [NeurIPS - bit Transformer Language Models [[code](https://github.com/wimh966/outlier_suppression)] ![GitHub Repo stars](https://img.shields.io/github/stars/wimh966/outlier_suppression)
      • [ICML - wise Division for Post-Training Quantization [[code](https://openreview.net/attachment?id=-tYCaP0phY_&name=supplementary_material)]
      • [ICML
      • [ACL - agnostic Quantization Approach for Pre-trained Language Models
      • [ArXiv
      • [ArXiv - based Post-training Quantization for Large Language Models [[code](https://github.com/hahnyuan/RPTQ4LLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/hahnyuan/RPTQ4LLM)
      • [ArXiv - V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
      • [ArXiv - Bit Quantization on Large Language Models
      • [ArXiv - Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
      • [ArXiv - Efficiency Trade-off of LLM Inference with Transferable Prompt
      • [ArXiv - aware Weight Quantization for LLM Compression and Acceleration [[code](https://github.com/mit-han-lab/llm-awq)] ![GitHub Repo stars](https://img.shields.io/github/stars/mit-han-lab/llm-awq)
      • [ArXiv - QAT: Data-Free Quantization Aware Training for Large Language Models
      • [ArXiv - Quantized Representation for Near-Lossless LLM Weight Compression [[code](https://github.com/Vahe1994/SpQR)] ![GitHub Repo stars](https://img.shields.io/github/stars/Vahe1994/SpQR)
      • [ArXiv
      • [ArXiv - and-Sparse Quantization [[code](https://github.com/SqueezeAILab/SqueezeLLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/SqueezeAILab/SqueezeLLM)
      • [ArXiv - Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
      • [ArXiv - FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers [[code](https://github.com/lightmatter-ai/INT-FP-QSim)] ![GitHub Repo stars](https://img.shields.io/github/stars/lightmatter-ai/INT-FP-QSim)
      • [ICML - DASLab/QIGen)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/QIGen)
      • [ArXiv
      • [ArXiv - FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
      • [ArXiv - Uniform Post-Training Quantization via Power Exponent Search
      • [ArXiv - Based Post-Training Quantization: Challenging the Status Quo
      • [ArXiv - Grained Weight-Only Quantization for LLMs
      • [ArXiv
      • [ArXiv - grained Post-Training Quantization for Large Language Models
      • [ArXiv - time Weight Clustering for Large Language Models
      • [ArXiv - based Quantization for Language Models - An Efficient and Intuitive Algorithm
      • [ArXiv - performance Low-bit Quantization of Large Language Models
      • [ArXiv - Training Quantization on Large Language Models
      • [ArXiv - VQ: Compression for Tractable Internet-Scale Memory
      • [ArXiv - compressor)] ![GitHub Repo stars](https://img.shields.io/github/stars/intel/neural-compressor)
      • [ArXiv - training Quantization with FP8 Formats [[code](https://github.com/intel/neural-compressor)] ![GitHub Repo stars](https://img.shields.io/github/stars/intel/neural-compressor)
      • [ArXiv - LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [[code](https://github.com/yuhuixu1993/qa-lora)]
      • [ArXiv - bit Weight Quantization of Large Language Models
      • [ArXiv - Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
      • [ArXiv - LLM: Partially Binarized Large Language Models [[code](https://github.com/hahnyuan/BinaryLLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/hahnyuan/BinaryLLM)
      • [ArXiv - Grained Quantization for LLM
      • [ArXiv - Bitwidth Quantization for Large Language Models
      • [ArXiv - Fine-Tuning-Aware Quantization for Large Language Models [[code](https://github.com/yxli2123/LoftQ)]
      • [ArXiv - parameter Tuning of LLMs with Affordable Resources
      • [ArXiv - compressor)] ![GitHub Repo stars](https://img.shields.io/github/stars/intel/neural-compressor)
      • [ArXiv - bit Transformers for Large Language Models [[code](https://github.com/kyegomez/BitNet)] ![GitHub Repo stars](https://img.shields.io/github/stars/kyegomez/BitNet)
      • [ArXiv - LM: Training FP8 Large Language Models [[code](https://github.com/Azure/MS-AMP)] ![GitHub Repo stars](https://img.shields.io/github/stars/Azure/MS-AMP)
      • [ArXiv
      • [ArXiv - Training Quantization with Activation-Weight Equalization for Large Language Models
      • [ArXiv - bit Quantization for Efficient and Accurate LLM Serving [[code](https://github.com/efeslab/Atom)] ![GitHub Repo stars](https://img.shields.io/github/stars/efeslab/Atom)
      • [ArXiv - training Pruning and Quantization of Large Language Models?
      • [ArXiv
      • [EMNLP - Shot Sharpness-Aware Quantization for Pre-trained Language Models
      • [EMNLP - based Quantisation: What is Important for Sub-8-bit LLM Inference?
      • [EMNLP
      • [EMNLP
      • [ICML - Rank and Sparse Approximation [[code](https://github.com/yxli2123/LoSparse)] ![GitHub Repo stars](https://img.shields.io/github/stars/yxli2123/LoSparse)
      • [NeurIPS - Pruner: On the Structural Pruning of Large Language Models [[code](https://github.com/horseee/LLM-Pruner)] ![GitHub Repo stars](https://img.shields.io/github/stars/horseee/LLM-Pruner)
      • [ICML
      • [ArXiv - Rank Parameter-Efficient Fine-Tuning
      • [ArXiv
      • [VLDB - LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity [[code](https://github.com/AlibabaResearch/flash-llm)] ![GitHub Repo stars](https://img.shields.io/github/stars/AlibabaResearch/flash-llm)
      • [ArXiv - Centric Angle of LLM Pre-trained Weights through Sparsity [[code](https://github.com/VITA-Group/Junk_DNA_Hypothesis)] ![GitHub Repo stars](https://img.shields.io/github/stars/VITA-Group/Junk_DNA_Hypothesis)
      • [ArXiv
      • [ArXiv
      • [ArXiv
      • [ArXiv - training via Structured Pruning [[code](https://github.com/princeton-nlp/LLM-Shearing)] ![GitHub Repo stars](https://img.shields.io/github/stars/princeton-nlp/LLM-Shearing)
      • [ArXiv - Free Fine-tuning for Sparse LLMs [[code](https://github.com/zyxxmu/DSnoT)] ![GitHub Repo stars](https://img.shields.io/github/stars/zyxxmu/DSnoT)
      • [ArXiv - Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
      • [ArXiv
      • [ArXiv - Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
      • [ArXiv - LM: A Diverse Herd of Distilled Models from Large-Scale Instructions [[code](https://github.com/mbzuai-nlp/LaMini-LM)]
      • [ArXiv
      • [ArXiv
      • [ArXiv - aided Distillation Specializes Large Models in Reasoning
      • [ArXiv
      • [ArXiv - regressive Sequence Models
      • [ArXiv - of-Thought Prompt Distillation for Multimodal Named Entity Recognition and Multimodal Relation Extraction
      • [ArXiv - agnostic Distillation of Encoder-Decoder Language Models
      • [ArXiv - CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
      • [ArXiv
      • [ArXiv - Source Large Language Model [[code](https://github.com/YJiangcm/Lion)]
      • [EMNLP - KD: Multi-CoT Consistent Knowledge Distillation
      • [EMNLP - EMNLP-2023)]
      • [EMNLP - ai/batch-prompting)]
      • [EMNLP - nlp/AutoCompressors)]
      • [EMNLP
      • [EMNLP
      • [ArXiv - Context Learning
      • [ArXiv
      • [ArXiv - context Autoencoder for Context Compression in a Large Language Model [[code](https://github.com/getao/icae)]
      • [ArXiv
      • [ArXiv
      • [ArXiv
      • [ArXiv - Augmented LMs with Compression and Selective Augmentation [[code](https://github.com/carriex/recomp)]
      • [ArXiv
      • [ArXiv - tuning of Long-Context Large Language Models [[code](https://github.com/dvlab-research/LongLoRA)]
      • [NeurIPS
      • [NeurIPS - Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
      • [ICML - Zip: Deep Compression of Finetuned Large Language Models
      • [ICML - bit precision: k-bit Inference Scaling Laws
      • [ACL - based Language Models with GPU-Friendly Sparsity and Quantization
      • [ArXiv - bit Integers [[code](https://github.com/xijiu9/Train_Transformers_with_INT4)] ![GitHub Repo stars](https://img.shields.io/github/stars/xijiu9/Train_Transformers_with_INT4)
      • [ArXiv - Bit Quantization of Large Language Models With Guarantees [[code](https://github.com/jerry-chee/QuIP)] ![GitHub Repo stars](https://img.shields.io/github/stars/jerry-chee/QuIP)
      • [ArXiv - Scaled Logit Distillation for Ternary Weight Generative Language Models
      • [ICML - Training Quantization for Large Language Models [[code](https://github.com/mit-han-lab/smoothquant)] ![GitHub Repo stars](https://img.shields.io/github/stars/mit-han-lab/smoothquant)
      • [ArXiv - LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [[code](https://github.com/HanGuo97/lq-lora)]
      • [ICML
      • [ICLR
      • [AutoML
      • [ArXiv
      • [ACL - of-Thought Distillation: Small Models Can Also "Think" Step-by-Step [[code](https://github.com/allenai/cot_distillation)]
      • [ACL
      • [ACL
      • [ACL - Consistent Chain-of-Thought Distillation [[code](https://github.com/wangpf3/consistent-CoT-distillation)]
      • [ACL - KD: Attribution-Driven Knowledge Distillation for Language Model Compression
      • [ACL - teacher)]
      • [ACL
      • [ACL - effective Distillation of Large Language Models [[code](https://github.com/Sayan21/MAKD)]
      • [ACL - by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes [[code](https://github.com/google-research/distilling-step-by-step)]
      • [EMNLP - to-Reason)]
      • [NeurIPS - trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning [[code](https://github.com/BaohaoLiao/mefts)]
      • [ArXiv - LoRA: Serving Thousands of Concurrent LoRA Adapters [[code](https://github.com/S-LoRA/S-LoRA)]
      • [ACL - instruction-effectiveness)]
      • [ArXiv - context Attention in Near-Linear Time
      • [ISCA - friendly Outlier-Victim Pair Quantization
      • [code
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
      • [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
    • 2022

      • [NeurIPS - bit Matrix Multiplication for Transformers at Scale
      • [ICML
      • [ICLR
      • [NeurIPS - distilled Transformer [[code](https://github.com/facebookresearch/bit)]
      • [NeurIPS - training Quantization of Pre-trained Language Models
      • [NeurIPS - Training Quantization for Large-Scale Transformers
      • [ArXiv
      • [ArXiv - context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
      • [ACL - tuning of Large Models [[code](https://petals.ml/)]
      • [ACL - trained Language Models via Quantization
    • 2021

      • [ICML - BERT: Integer-only BERT Quantization
      • [ACL
      • [ACL - trained Language Model Distillation from Multiple Teachers
      • [ACL - time Quantization of Attention Values in Transformers
    • 2020

      • [AAAI - BERT: Hessian Based Ultra Low Precision Quantization of BERT
      • [EMNLP - aware Ultra-low Bit BERT
      • [EMNLP
      • [IJCAI - bit Integer Inference for the Transformer Model
      • [ACL
      • [ICLR
      • [MICRO - Based NLP Models for Low Latency and Energy Efficient Inference
    • 2024

      • [ArXiv
      • [ArXiv
      • [ArXiv - Efficient Tuning of Quantized Large Language Models
      • [ArXiv
      • [ArXiv - LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
      • [ArXiv
      • [ICLR
      • [ArXiv - Aware Training on Large Language Models via LoRA-wise LSQ
      • [ArXiv - Bit Quantized Large Language Model
      • [ArXiv - Aware Training for the Acceleration of Lightweight LLMs on the Edge [[code](https://github.com/shawnricecake/EdgeQAT)] ![GitHub Repo stars](https://img.shields.io/github/stars/shawnricecake/EdgeQAT)
      • [ArXiv - Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
      • [ArXiv - Rank Quantization Error Reconstruction for LLMs
      • [ArXiv - Free Asymmetric 2bit Quantization for KV Cache [[code](https://github.com/jy-yuan/KIVI)] ![GitHub Repo stars](https://img.shields.io/github/stars/jy-yuan/KIVI)
      • [ArXiv - Training Quantization for LLMs [[code](https://github.com/Aaronhuang-778/BiLLM)]![GitHub Repo stars](https://img.shields.io/github/stars/Aaronhuang-778/BiLLM)
      • [ArXiv - RelaxML/quip-sharp)] ![GitHub Repo stars](https://img.shields.io/github/stars/Cornell-RelaxML/quip-sharp)
      • [ArXiv - Aware Dequantization
      • [ArXiv - Finetuning Quantization of LLMs via Information Retention [[code](https://github.com/htqin/IR-QLoRA)]![GitHub Repo stars](https://img.shields.io/github/stars/htqin/IR-QLoRA)
      • [ArXiv - Tune May Only Be Worth One Bit [[code](https://github.com/FasterDecoding/BitDelta)] ![GitHub Repo stars](https://img.shields.io/github/stars/FasterDecoding/BitDelta)
      • [AAAI EIW Workshop 2024 - Rank Adaptation for Efficient Large Language Model Tuning
      • [ArXiv - 4-Bit LLMs via Self-Distillation [[code](https://github.com/DD-DuDa/BitDistiller)] ![GitHub Repo stars](https://img.shields.io/github/stars/DD-DuDa/BitDistiller)
      • [ArXiv - bit Large Language Models
      • [ArXiv - LLM: Accurate Dual-Binarization for Efficient LLMs
      • [ArXiv
      • [DAC
      • [ArXiv - Aware Mixed Precision Quantization
      • [ArXiv - bound for Large Language Models with Per-tensor Quantization
      • [ArXiv - ai-research/gptvq)] ![GitHub Repo stars](https://img.shields.io/github/stars/qualcomm-ai-research/gptvq)
      • [DAC - aware Post-Training Mixed-Precision Quantization for Large Language Models
      • [ArXiv
      • [ArXiv - PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
      • [ArXiv
      • [ArXiv
      • [ArXiv - free Quantization Algorithm for LLMs
      • [ArXiv - KVCacheQuantization)] ![GitHub Repo stars](https://img.shields.io/github/stars/ClubieDong/QAQ-KVCacheQuantization)
      • [ArXiv
      • [ArXiv - Lossless Generative Inference of LLM
      • [ArXiv - LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression [[code](https://github.com/AIoT-MLSys-Lab/SVD-LLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/AIoT-MLSys-Lab/SVD-LLM)
      • [ICLR Practical ML for Low Resource Settings Workshop
      • [ArXiv
      • [ArXiv
      • [ArXiv - Free 4-Bit Inference in Rotated LLMs [[code](https://github.com/spcl/QuaRot)] ![GitHub Repo stars](https://img.shields.io/github/stars/spcl/QuaRot)
      • [ArXiv - compensation)] ![GitHub Repo stars](https://img.shields.io/github/stars/GongCheng1919/bias-compensation)
      • [arXiv - bit Quantized LLaMA3 Models? An Empirical Study [[code](https://github.com/Macaronlin/LLaMA3-Quantization)]![GitHub Repo stars](https://img.shields.io/github/stars/Macaronlin/LLaMA3-Quantization) [[HuggingFace](https://huggingface.co/LLMQ)]
  • Survey

    • [Arxiv
    • [Arxiv
    • [Arxiv - LLM-Survey)] ![GitHub Repo stars](https://img.shields.io/github/stars/tding1/Efficient-LLM-Survey)
    • [Arxiv - MLSys-Lab/Efficient-LLMs-Survey)] ![GitHub Repo stars](https://img.shields.io/github/stars/AIoT-MLSys-Lab/Efficient-LLMs-Survey)
    • [TACL - Scale Transformer-Based Models: A Case Study on BERT
    • [Arxiv
    • [Arxiv
    • [Arxiv
    • [Arxiv - efficient LLM and Multimodal Foundation Models [[code](https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey)] ![GitHub Repo stars](https://img.shields.io/github/stars/UbiquitousLearning/Efficient_Foundation_Model_Survey)
    • [Arxiv
    • [Arxiv
    • [Arxiv
    • [Arxiv - Knowledge-Distillation-of-LLMs)] ![GitHub Repo stars](https://img.shields.io/github/stars/Tebmer/Awesome-Knowledge-Distillation-of-LLMs)
    • [Arxiv
    • [Arxiv - LLM-Survey)] ![GitHub Repo stars](https://img.shields.io/github/stars/nyunAI/Faster-LLM-Survey)
    • [Arxiv - Bench)] ![GitHub Repo stars](https://img.shields.io/github/stars/hemingkx/Spec-Bench) [[Blog]](https://sites.google.com/view/spec-bench)
    • [Arxiv
    • [Arxiv - Efficient Large Language Models [[code](https://github.com/tiingweii-shii/Awesome-Resource-Efficient-LLM-Papers)] ![GitHub Repo stars](https://img.shields.io/github/stars/tiingweii-shii/Awesome-Resource-Efficient-LLM-Papers)
  • Vision

    • 2023

      • [ArXiv - Aware Fine-Tuning of Low-Bit Diffusion Models
      • [ArXiv - Diffusion: Vector Quantized Discrete Diffusion Model with Spiking Neural Networks [[code](https://github.com/Arktis2022/Spiking-Diffusion)]
      • [ArXiv - free Quantization for Diffusion Models
      • [ArXiv - Pruning)]
      • [ArXiv - Up Distillation: You Only Need to Train Once for Accelerating Sampling [[code](https://anonymous.4open.science/r/Catch-Up-Distillation-E31F)]
      • [CVPR
      • [NeurIPS - DM: An Efficient Low-bit Quantized Diffusion Model
      • [NeurIPS - Training Quantization for Diffusion Models [[code](https://github.com/ziplab/PTQD)] ![GitHub Repo stars](https://img.shields.io/github/stars/ziplab/PTQD)
      • [NeurIPS
      • [ICCV - Diffusion: Quantizing Diffusion Models [[code](https://github.com/Xiuyu-Li/q-diffusion)] ![GitHub Repo stars](https://img.shields.io/github/stars/Xiuyu-Li/q-diffusion)
      • [CVPR - training Quantization on Diffusion Models [[code](https://github.com/42Shawn/PTQ4DM)] ![GitHub Repo stars](https://img.shields.io/github/stars/42Shawn/PTQ4DM)
      • [ICLR - Conditioning
      • [ArXiv - VAE Made Simple [[code](https://github.com/google-research/google-research/tree/master/fsq)] ![GitHub Repo stars](https://img.shields.io/github/stars/google-research/google-research)
      • [TPAMI
      • [CVPR
      • [ICME - based Feature Distillation [[code](https://github.com/zju-SWJ/RCFD)]
      • [ICML - based Combinatorial Optimization Solvers by Progressive Distillation [[code](https://github.com/jwrh/Accelerating-Diffusion-based-Combinatorial-Optimization-Solvers-by-Progressive-Distillation)]
      • [ICML - Distillation of Internet-Scale Text-to-Image Diffusion Models [[code](https://github.com/nannullna/safe-diffusion)]
      • [ArXiv - free Distillation of Denoising Diffusion Models with Bootstrapping
      • [ArXiv - to-Image Diffusion Model on Mobile Devices within Two Seconds
      • [ArXiv
      • [ArXiv - to-Image Diffusion Models
      • [ArXiv
      • [ArXiv - Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
    • 2024

      • [ArXiv - Zheng/BinaryDM)]![GitHub Repo stars](https://img.shields.io/github/stars/Xingyu-Zheng/BinaryDM)
  • Awesome-Repo

  • Star History