Awesome-Efficient-AIGC

A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
https://github.com/Efficient-ML/Awesome-Efficient-AIGC

Last synced: 8 days ago
JSON representation

Awesome-Repo
- 2023
Language
- 2019
  - [ICML - Bit Quantization of Transformer Neural Machine Language Translation Model
  - [NeurIPS
  - [NeurIPS
  - [NeurIPS
- 2020
  - [EMNLP
  - [IJCAI - bit Integer Inference for the Transformer Model
  - [EMNLP - aware Ultra-low Bit BERT
  - [AAAI - BERT: Hessian Based Ultra Low Precision Quantization of BERT
  - [ACL
  - [ICLR
  - [MICRO - Based NLP Models for Low Latency and Energy Efficient Inference
- 2021
  - [ICML - BERT: Integer-only BERT Quantization
  - [ACL
  - [ACL - trained Language Model Distillation from Multiple Teachers
  - [ACL - time Quantization of Attention Values in Transformers
- 2022
  - [NeurIPS - bit Matrix Multiplication for Transformers at Scale
  - [NeurIPS - training Quantization of Pre-trained Language Models
  - [NeurIPS - Training Quantization for Large-Scale Transformers
  - [NeurIPS - distilled Transformer [[code](https://github.com/facebookresearch/bit)]
  - [ICLR
  - [ArXiv
  - [ArXiv - context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
  - [ACL - tuning of Large Models [[code](https://petals.ml/)]
  - [ICML
  - [ACL - trained Language Models via Quantization
- 2023
  - [ICLR - Training Quantization for Generative Pre-trained Transformers [[code](https://github.com/IST-DASLab/gptq)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/gptq)
  - [NeurIPS
  - [NeurIPS - Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
  - [ICML - Training Quantization for Large Language Models [[code](https://github.com/mit-han-lab/smoothquant)] ![GitHub Repo stars](https://img.shields.io/github/stars/mit-han-lab/smoothquant)
  - [ICML - wise Division for Post-Training Quantization [[code](https://openreview.net/attachment?id=-tYCaP0phY_&name=supplementary_material)]
  - [ICML
  - [ICML - Zip: Deep Compression of Finetuned Large Language Models
  - [ICML - DASLab/QIGen)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/QIGen)
  - [ICML - bit precision: k-bit Inference Scaling Laws
  - [ACL - agnostic Quantization Approach for Pre-trained Language Models
  - [ACL - based Language Models with GPU-Friendly Sparsity and Quantization
  - [EMNLP - based Quantisation: What is Important for Sub-8-bit LLM Inference?
  - [EMNLP - Shot Sharpness-Aware Quantization for Pre-trained Language Models
  - [EMNLP - FP4: 4-Bit Floating-Point Quantized Transformers [[code](https://github.com/nbasyl/LLM-FP4)] ![GitHub Repo stars](https://img.shields.io/github/stars/nbasyl/LLM-FP4)
  - [EMNLP
  - [ISCA - friendly Outlier-Victim Pair Quantization
  - [ArXiv - V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
  - [ArXiv - GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
  - [ArXiv
  - [ArXiv - QAT: Data-Free Quantization Aware Training for Large Language Models
  - [ArXiv - aware Weight Quantization for LLM Compression and Acceleration [[code](https://github.com/mit-han-lab/llm-awq)] ![GitHub Repo stars](https://img.shields.io/github/stars/mit-han-lab/llm-awq)
  - [ArXiv - bit Integers [[code](https://github.com/xijiu9/Train_Transformers_with_INT4)] ![GitHub Repo stars](https://img.shields.io/github/stars/xijiu9/Train_Transformers_with_INT4)
  - [ArXiv - and-Sparse Quantization [[code](https://github.com/SqueezeAILab/SqueezeLLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/SqueezeAILab/SqueezeLLM)
  - [ArXiv
  - [ArXiv - Quantized Representation for Near-Lossless LLM Weight Compression [[code](https://github.com/Vahe1994/SpQR)] ![GitHub Repo stars](https://img.shields.io/github/stars/Vahe1994/SpQR)
  - [ArXiv - Bit Quantization of Large Language Models With Guarantees [[code](https://github.com/jerry-chee/QuIP)] ![GitHub Repo stars](https://img.shields.io/github/stars/jerry-chee/QuIP)
  - [ArXiv
  - [ArXiv
  - [ArXiv - based Post-training Quantization for Large Language Models [[code](https://github.com/hahnyuan/RPTQ4LLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/hahnyuan/RPTQ4LLM)
  - [ArXiv - Bit Quantization on Large Language Models
  - [ArXiv - Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
  - [ArXiv - FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers [[code](https://github.com/lightmatter-ai/INT-FP-QSim)] ![GitHub Repo stars](https://img.shields.io/github/stars/lightmatter-ai/INT-FP-QSim)
  - [ArXiv
  - [ArXiv - FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
  - [ArXiv - Uniform Post-Training Quantization via Power Exponent Search
  - [ArXiv - Scaled Logit Distillation for Ternary Weight Generative Language Models
  - [ArXiv - Based Post-Training Quantization: Challenging the Status Quo
  - [ArXiv - Grained Weight-Only Quantization for LLMs
  - [ArXiv - VQ: Compression for Tractable Internet-Scale Memory
  - [ArXiv - grained Post-Training Quantization for Large Language Models
  - [ArXiv - time Weight Clustering for Large Language Models
  - [ArXiv - based Quantization for Language Models - An Efficient and Intuitive Algorithm
  - [ArXiv - performance Low-bit Quantization of Large Language Models
  - [ArXiv - Training Quantization on Large Language Models
  - [ArXiv - compressor)] ![GitHub Repo stars](https://img.shields.io/github/stars/intel/neural-compressor)
  - [ArXiv - training Quantization with FP8 Formats [[code](https://github.com/intel/neural-compressor)] ![GitHub Repo stars](https://img.shields.io/github/stars/intel/neural-compressor)
  - [ArXiv - LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [[code](https://github.com/yuhuixu1993/qa-lora)]
  - [ArXiv - bit Weight Quantization of Large Language Models
  - [ArXiv - Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
  - [ArXiv - LLM: Partially Binarized Large Language Models [[code](https://github.com/hahnyuan/BinaryLLM)] ![GitHub Repo stars](https://img.shields.io/github/stars/hahnyuan/BinaryLLM)
  - [ArXiv - Grained Quantization for LLM
  - [ArXiv - parameter Tuning of LLMs with Affordable Resources
  - [ArXiv - Bitwidth Quantization for Large Language Models
  - [ArXiv - Fine-Tuning-Aware Quantization for Large Language Models [[code](https://github.com/yxli2123/LoftQ)]
  - [ArXiv - compressor)] ![GitHub Repo stars](https://img.shields.io/github/stars/intel/neural-compressor)
  - [ArXiv - bit Transformers for Large Language Models [[code](https://github.com/kyegomez/BitNet)] ![GitHub Repo stars](https://img.shields.io/github/stars/kyegomez/BitNet)
  - [ArXiv - LM: Training FP8 Large Language Models [[code](https://github.com/Azure/MS-AMP)] ![GitHub Repo stars](https://img.shields.io/github/stars/Azure/MS-AMP)
  - [ArXiv - bit Quantization for Efficient and Accurate LLM Serving [[code](https://github.com/efeslab/Atom)] ![GitHub Repo stars](https://img.shields.io/github/stars/efeslab/Atom)
  - [ArXiv - Training Quantization with Activation-Weight Equalization for Large Language Models
  - [ArXiv
  - [ArXiv - LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning [[code](https://github.com/HanGuo97/lq-lora)]
  - [ICML
  - [ICML - Shot [[code](https://github.com/IST-DASLab/sparsegpt)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/sparsegpt)
  - [ICML - Rank and Sparse Approximation [[code](https://github.com/yxli2123/LoSparse)] ![GitHub Repo stars](https://img.shields.io/github/stars/yxli2123/LoSparse)
  - [ICML
  - [ICLR - DASLab/gptq)] ![GitHub Repo stars](https://img.shields.io/github/stars/IST-DASLab/gptq)
  - [ICLR
  - [NeurIPS - bit Transformer Language Models [[code](https://github.com/wimh966/outlier_suppression)] ![GitHub Repo stars](https://img.shields.io/github/stars/wimh966/outlier_suppression)
  - [NeurIPS - Pruner: On the Structural Pruning of Large Language Models [[code](https://github.com/horseee/LLM-Pruner)] ![GitHub Repo stars](https://img.shields.io/github/stars/horseee/LLM-Pruner)
  - [AutoML
  - [VLDB - LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity [[code](https://github.com/AlibabaResearch/flash-llm)] ![GitHub Repo stars](https://img.shields.io/github/stars/AlibabaResearch/flash-llm)
  - [ArXiv
  - [ArXiv - Rank Parameter-Efficient Fine-Tuning
  - [ArXiv
  - [ArXiv - training via Structured Pruning [[code](https://github.com/princeton-nlp/LLM-Shearing)] ![GitHub Repo stars](https://img.shields.io/github/stars/princeton-nlp/LLM-Shearing)
  - [ArXiv
  - [ArXiv - Centric Angle of LLM Pre-trained Weights through Sparsity [[code](https://github.com/VITA-Group/Junk_DNA_Hypothesis)] ![GitHub Repo stars](https://img.shields.io/github/stars/VITA-Group/Junk_DNA_Hypothesis)
  - [ArXiv
  - [ArXiv - Free Fine-tuning for Sparse LLMs [[code](https://github.com/zyxxmu/DSnoT)] ![GitHub Repo stars](https://img.shields.io/github/stars/zyxxmu/DSnoT)
  - [ArXiv - Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
  - [ArXiv - Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
  - [ArXiv
  - [ACL - of-Thought Distillation: Small Models Can Also "Think" Step-by-Step [[code](https://github.com/allenai/cot_distillation)]
  - [ACL
  - [ACL
  - [ACL - Consistent Chain-of-Thought Distillation [[code](https://github.com/wangpf3/consistent-CoT-distillation)]
  - [ACL - KD: Attribution-Driven Knowledge Distillation for Language Model Compression
  - [ACL - teacher)]
  - [ACL
  - [ACL - effective Distillation of Large Language Models [[code](https://github.com/Sayan21/MAKD)]
  - [ACL - by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes [[code](https://github.com/google-research/distilling-step-by-step)]
  - [EMNLP - to-Reason)]
  - [EMNLP - EMNLP-2023)]
  - [EMNLP - KD: Multi-CoT Consistent Knowledge Distillation
  - [EMNLP
  - [ArXiv - LM: A Diverse Herd of Distilled Models from Large-Scale Instructions [[code](https://github.com/mbzuai-nlp/LaMini-LM)]
  - [ArXiv - agnostic Distillation of Encoder-Decoder Language Models
  - [ArXiv - Source Large Language Model [[code](https://github.com/YJiangcm/Lion)]
  - [ArXiv - aided Distillation Specializes Large Models in Reasoning
  - [ArXiv
  - [code
  - [ArXiv
  - [ArXiv
  - [ArXiv - regressive Sequence Models
  - [ArXiv - of-Thought Prompt Distillation for Multimodal Named Entity Recognition and Multimodal Relation Extraction
  - [ArXiv - CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
  - [ArXiv
  - [ArXiv
  - [ArXiv - training Pruning and Quantization of Large Language Models?
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [NeurIPS - trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning [[code](https://github.com/BaohaoLiao/mefts)]
  - [ArXiv - Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
  - [ArXiv - tuning of Long-Context Large Language Models [[code](https://github.com/dvlab-research/LongLoRA)]
  - [ArXiv - LoRA: Serving Thousands of Concurrent LoRA Adapters [[code](https://github.com/S-LoRA/S-LoRA)]
  - [ACL - instruction-effectiveness)]
  - [EMNLP - nlp/AutoCompressors)]
  - [EMNLP
  - [EMNLP
  - [EMNLP - ai/batch-prompting)]
  - [ArXiv
  - [ArXiv - Context Learning
  - [ArXiv - Efficiency Trade-off of LLM Inference with Transferable Prompt
  - [ArXiv - context Autoencoder for Context Compression in a Large Language Model [[code](https://github.com/getao/icae)]
  - [ArXiv
  - [ArXiv
  - [ArXiv
  - [ArXiv
  - [ArXiv - Augmented LMs with Compression and Selective Augmentation [[code](https://github.com/carriex/recomp)]
  - [ArXiv - context Attention in Near-Linear Time
  - [ArXiv
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]
  - [Nature - efficient fine-tuning of large-scale pre-trained language models [[code](https://github.com/thunlp/OpenDelta)]

Programming Languages

Python 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

Awesome-Efficient-AIGC

Awesome-Repo

2023

Language

2019

2020

2021

2022

2023