https://github.com/UbiquitousLearning/Paper-list-resource-efficient-large-language-model
https://github.com/UbiquitousLearning/Paper-list-resource-efficient-large-language-model
large-language-models paperlist transfomer
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/UbiquitousLearning/Paper-list-resource-efficient-large-language-model
- Owner: UbiquitousLearning
- Created: 2023-04-24T09:53:33.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-17T12:36:49.000Z (over 1 year ago)
- Last Synced: 2024-11-21T01:34:34.612Z (11 months ago)
- Topics: large-language-models, paperlist, transfomer
- Homepage:
- Size: 71.3 KB
- Stars: 95
- Watchers: 5
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
> :warning: This repository is not maintained anymore. Checkout our [survey paper](https://arxiv.org/pdf/2401.08092.pdf) on efficient LLM and the corresponding [paper list](https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey).
# Paper-list-resource-efficient-large-language-model
Target venues: system conferences (OSDI/SOSP/ATC/EuroSys), architecture conferences (ISCA/MICRO/ASPLOS/HPCA), network conferences (NSDI/SIGCOMM), mobile conferences (MobiCom/MobiSys/SenSys/UbiComp), AI conferences (NeurIPS/ACL/ICLR/ICML)
We will keep maintaining this list :)
Note: We only focus on inference now. We plan to involve training work in the future.
Example: [Conference'year] [Title](doi), First-author Affiliation
## Model
[ICLR'23] [GPTQ: ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS](https://openreview.net/pdf?id=tcbBPnfwxS), IST Austria
[ICLR'23] [Token Merging: Your ViT But Faster](https://openreview.net/pdf?id=JroZRaRw7Eu), Georgia Tech
[ICLR'23] [Efficient Attention via Control Variates](https://openreview.net/pdf?id=G-uNfHKrj46), University of Hong Kong
[ICLR'23] [HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer](https://openreview.net/pdf?id=3F6I-0-57SC), University of Chinese Academy of Sciences
[ICLR'23] [Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models](https://openreview.net/pdf?id=a2jNdqE2102), Tencent AI Lab[MLSys'23] [Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization](https://mlsys.org/Conferences/2023/Schedule?showEvent=2442), National university of Singapore
[ACL'22] [AraT5: Text-to-Text Transformers for Arabic Language Generation](https://aclanthology.org/2022.acl-long.47/) The University of British Columbia
[ACL'22] [ClusterFormer: Neural Clustering Attention for Efficient and Effective Transformer](https://aclanthology.org/2022.acl-long.170/) Tianjin University
[ACL'22] [∞-former: Infinite Memory Transformer](https://aclanthology.org/2022.acl-long.375/) Instituto de Telecomunicações
[ACL'22] [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https://aclanthology.org/2022.acl-long.534/) South China University of Technology
[ACL'22] [PLANET: Dynamic Content Planning in Autoregressive Transformers for Long-form Text Generation](https://aclanthology.org/2022.acl-long.163/) Baidu Inc[ICLR'22] [Memorizing Transformers](https://openreview.net/pdf?id=TrjbxzRcnf-), Google
[ICLR'22] [Understanding the Role of Self Attention for Efficient Speech Recognition]([https://openreview.net/pdf?id=TrjbxzRcnf-](https://openreview.net/pdf?id=AvcfxqRy4Y)), Seoul National University[NeurIPS'22] [Confident Adaptive Language Modeling](https://openreview.net/pdf?id=uLYc4L3C81A), Google Research
[NeurIPS'22] [Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling](https://proceedings.neurips.cc/paper_files/paper/2022/hash/486ff0b164cf92b0255fe39863bcf99e-Abstract-Conference.html) Microsoft Research Asia
[NeurIPS'22] [Large Language Models are Zero-Shot Reasoners](https://proceedings.neurips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html) The University of Tokyo
[NeurIPS'22] [Training language models to follow instructions with human feedback](https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html), OpenAI[ACL'21] [RealFormer: Transformer Likes Residual Attention](https://aclanthology.org/2021.findings-acl.81/) Google Research
[NeurIPS'21] [Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices](https://proceedings.neurips.cc/paper_files/paper/2021/hash/09def3ebbc44ff3426b28fcd88c83554-Abstract.html) Virginia Commonwealth University
[NeurIPS'21] [Systematic Generalization with Edge Transformers](https://proceedings.neurips.cc/paper_files/paper/2021/hash/0a4dc6dae338c9cb08947c07581f77a2-Abstract.html) University of California
[NeurIPS'21] [NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM](https://proceedings.neurips.cc/paper_files/paper/2021/hash/0e4f5cc9f4f3f7f1651a6b9f9214e5b1-Abstract.html) Colorado School of Mines
[NeurIPS'21] [Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems](https://proceedings.neurips.cc/paper_files/paper/2021/hash/2bd388f731f26312bfc0fe30da009595-Abstract.html) Jadavpur University
[NeurIPS'21] [Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification](https://proceedings.neurips.cc/paper_files/paper/2021/hash/3bbca1d243b01b47c2bf42b29a8b265c-Abstract.html) Amazon
[NeurIPS'21] [Sparse is Enough in Scaling Transformers](https://proceedings.neurips.cc/paper_files/paper/2021/hash/51f15efdd170e6043fa02a74882f0470-Abstract.html) Google Research
[NeurIPS'21] [Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation](https://proceedings.neurips.cc/paper_files/paper/2021/hash/8ce241e1ed84937ee48322b170b9b18c-Abstract.html) Macquarie University
[NeurIPS'21] [Long-Short Transformer: Efficient Transformers for Language and Vision](https://proceedings.neurips.cc/paper_files/paper/2021/hash/9425be43ba92c2b4454ca7bf602efad8-Abstract.html) University of Maryland
[NeurIPS'21] [Combiner: Full Attention Transformer with Sparse Computation Cost](https://proceedings.neurips.cc/paper_files/paper/2021/hash/bd4a6d0563e0604510989eb8f9ff71f5-Abstract.html) Stanford University
[NeurIPS'21] [FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention](https://proceedings.neurips.cc/paper_files/paper/2021/hash/f621585df244e9596dc70a39b579efb1-Abstract.html) University of California
[NeurIPS'21] [Searching for Efficient Transformers for Language Modeling](https://proceedings.neurips.cc/paper_files/paper/2021/hash/2f3c6a4cd8af177f6456e7e51a916ff3-Abstract.html) Google Research[SenSys'21] [LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications](https://dapowan.github.io/files/LIMU-BERT.pdf), Nanyang Technological University
[NeurIPS'20] [Deep Transformers with Latent Depth](https://proceedings.neurips.cc/paper_files/paper/2020/hash/1325cdae3b6f0f91a1b629307bf2d498-Abstract.html) Facebook AI Research
[NeurIPS'20] [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://proceedings.neurips.cc/paper_files/paper/2020/hash/2cd2915e69546904e4e5d4a2ac9e1652-Abstract.html) Carnegie Mellon University
[NeurIPS'20] [MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers](https://proceedings.neurips.cc/paper_files/paper/2020/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html) Microsoft Research
[NeurIPS'20] [Big Bird: Transformers for Longer Sequences](https://proceedings.neurips.cc/paper_files/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html) Google Research
[NeurIPS'20] [Fast Transformers with Clustered Attention](https://proceedings.neurips.cc/paper_files/paper/2020/hash/f6a8dd1c954c8506aadc764cc32b895e-Abstract.html)
Idiap Research Institute, Switzerland[NeurIPS'19] [Levenshtein Transformer](https://proceedings.neurips.cc/paper_files/paper/2019/hash/675f9820626f5bc0afb47b57890b466e-Abstract.html) Facebook AI Research
[NeurIPS'19] [Novel positional encodings to enable tree-based transformers](https://proceedings.neurips.cc/paper_files/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html) Microsoft Research
[NeurIPS'19] [A Tensorized Transformer for Language Modeling](https://proceedings.neurips.cc/paper_files/paper/2019/hash/dc960c46c38bd16e953d97cdeefdbc68-Abstract.html) Tianjin University[ICLR'18] [Non-Autoregressive Neural Machine Translation](https://openreview.net/pdf?id=B1l8BtlCb), University of Hong Kong
## Input
[UbiComp'22] [IF-ConvTransformer: A Framework for Human Activity Recognition Using IMU Fusion and ConvTransformer](https://dl.acm.org/doi/pdf/10.1145/3534584), National University of Defense Technology
## Training algorithm
[MobiCom'23] [Efficient Federated Learning for Modern NLP](https://dl.acm.org/doi/10.1145/3570361.3592505), Beijing University of Posts and Telecommunications
[MobiCom'23] [Federated Few-shot Learning for Mobile NLP](https://dl.acm.org/doi/10.1145/3570361.3613277), Beijing University of Posts and Telecommunications
[ICLR'23] [Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization](https://openreview.net/pdf?id=KGV-GBh8fb), Tsinghua University
[ICLR'23] [Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers](https://openreview.net/pdf?id=j8IiQUM33s), Hong Kong University of Science and Technology
[ATC'23] [Accelerating Distributed MoE Training and Inference with Lina](https://www.usenix.org/conference/atc23/presentation/li-jiamin), City University of Hong Kong
[ATC'23] [SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization](https://www.usenix.org/conference/atc23/presentation/zhai), Tsinghua University[ICLR'22] [Towards a Unified View of Parameter-Efficient Transfer Learning](https://openreview.net/pdf?id=0RDcd5Axok), Carnegie Mellon University
[NeurIPS'22] [AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning](https://proceedings.neurips.cc/paper_files/paper/2022/hash/4fdf8d49476a8001c91f9e9e90530e13-Abstract-Conference.html), Sun Yat-sen University
[NeurIPS'22] [A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/hash/7a27143ea615262a0c122eb179c9b7a6-Abstract-Conference.html), Chinese Academy of Sciences
[NeurIPS'22] [Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively](https://proceedings.neurips.cc/paper_files/paper/2022/hash/869bfd807a513755bef25e3896a19a21-Abstract-Conference.html), Peking University[NeurIPS'20] [Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping](https://proceedings.neurips.cc/paper_files/paper/2020/hash/a1140a3d0df1c81e24ae954d935e8926-Abstract.html) Microsoft Corporation
[NeurIPS'19] [Ouroboros: On Accelerating Training of Transformer-Based Language Models](https://proceedings.neurips.cc/paper_files/paper/2019/hash/1b79b52d1bf6f71b2b1eb7ca08ed0776-Abstract.html) Duke University
## Inference engine
[ASPLOS'23] [FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks](https://dl.acm.org/doi/10.1145/3575693.3575747), Georgia Institute of Technology
[ISCA'23] [OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization](https://dl.acm.org/doi/10.1145/3579371.3589038), SJTU
[ISCA'23] [FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction](https://dl.acm.org/doi/abs/10.1145/3579371.3589057), THU[EuroSys'23][Tabi: An Efficient Multi-Level Inference System for Large Language Models](https://yidingwang.xyz/public/files/tabi_eurosys23.pdf), HKUST
[MLSys'23] [Flex: Adaptive Mixture-of-Experts at Scale](https://mlsys.org/Conferences/2023/Schedule?showEvent=2477), Microsoft Research[MLSys'23] [Efficiently Scaling Transformer Inference](https://mlsys.org/Conferences/2023/Schedule?showEvent=2463), Google
[OSDI'22] [Orca: A Distributed Serving System for Transformer-Based Generative Models](https://www.usenix.org/system/files/osdi22-yu.pdf), Seoul National University
[ATC'22] [PetS: A Unified Framework for Parameter-Efficient Transformers Serving](https://www.usenix.org/system/files/atc22-zhou-zhe.pdf), Peking University
[NeurIPS'22] [Towards Efficient Post-training Quantization of Pre-trained Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/hash/096347b4efc264ae7f07742fea34af1f-Abstract-Conference.html), Huawei Noah’s Ark Lab
[NeurIPS'22] [Solving Quantitative Reasoning Problems with Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/hash/18abbeef8cfe9203fdf9053c9c4fe191-Abstract-Conference.html), Google Research
[NeurIPS'22] [Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees](https://proceedings.neurips.cc/paper_files/paper/2022/hash/7a43b8eb92cd5f652b78eeee3fb6f910-Abstract-Conference.html), ETH Zürich, Switzerland
[NeurIPS'22] [Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/hash/b7c12689a89e98a61bcaa65285a41b7c-Abstract-Conference.html), NC State University
[NeurIPS'22] [Exploring Length Generalization in Large Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/hash/fb7451e43f9c1c35b774bcfad7a5714b-Abstract-Conference.html), Google Research[ACL'21] [MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers](https://aclanthology.org/2021.findings-acl.188/) Microsoft Research
[ASPLOS'23] [Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models](https://dl.acm.org/doi/pdf/10.1145/3567955.3567959) Google
[MobiCom'23] [LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup](https://dl.acm.org/doi/10.1145/3570361.3613285), Microsoft Research Asia
[ACL'23] [Distilling Script Knowledge from Large Language Models for Constrained Language Planning](https://aclanthology.org/2023.acl-long.236/), Fudan University
[ACL'23] [I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation](https://aclanthology.org/2023.acl-long.535/), University of Southern California
[ACL'23] [Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step](https://aclanthology.org/2023.acl-long.150/), University of California
[ACL'23] [GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model](https://aclanthology.org/2023.acl-industry.15/), Anhui University[NeurIPS'23] [Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind](https://arxiv.org/abs/2306.09299), University of North Carolina at Chapel Hill
[NeurIPS'23] [Blockwise Parallel Transformer for Large Context Models](https://arxiv.org/abs/2305.19370), UC Berkeley
[NeurIPS'23] [LLM-Pruner: On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627), National University of Singapore
[NeurIPS'23] [The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter](https://arxiv.org/abs/2306.03805), University of Texas at Austin
[NeurIPS'23] [Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time](https://arxiv.org/abs/2305.17118), Rice University
[NeurIPS'23] [Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers](https://arxiv.org/abs/2305.15805), ETH Zürich
[NeurIPS'23] [QuIP: 2-Bit Quantization of Large Language Models With Guarantees](https://arxiv.org/abs/2307.13304), Cornell University## Training Engine
[ASPLOS'23] [Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression](https://dl.acm.org/doi/10.1145/3575693.3575712) Yonsei University
[ASPLOS'23] [Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers](https://dl.acm.org/doi/10.1145/3575693.3575703) THU
[HPCA'23] [MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism](https://ieeexplore.ieee.org/document/10071077) UCST
[HPCA'23] [OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die Processing](https://ieeexplore.ieee.org/document/10071024) KAIST
[NeurIPS'23] [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314), University of Washington
## Compiler
## Hardware
## Search Engine
[UbiComp'23] [ODSearch: Fast and Resource Efficient On-device Natural Language Search for Fitness Trackers' Data](https://dl.acm.org/doi/10.1145/3569488) Boston University