Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/ModelTC/awesome-lm-system

Summary of system papers/frameworks/codes/tools on training or serving large model
https://github.com/ModelTC/awesome-lm-system
List: awesome-lm-system
Last synced: 16 days ago
JSON representation
Summary of system papers/frameworks/codes/tools on training or serving large model
Host: GitHub
URL: https://github.com/ModelTC/awesome-lm-system
Owner: ModelTC
License: apache-2.0
Created: 2023-06-21T15:40:53.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-12-17T10:24:11.000Z (about 1 year ago)
Last Synced: 2024-05-21T23:00:31.320Z (7 months ago)
Size: 35.2 KB
Stars: 56
Watchers: 9
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

ultimate-awesome - awesome-lm-system - Summary of system papers/frameworks/codes/tools on training or serving large model. (Other Lists / Monkey C Lists)
README

        # Awesome Large Model (LM) System [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

This repo collects papers, repos, tools for large model system, including training, inference, serving and compression.

- [Awesome Large Model (LM) System ](#awesome-large-model-lm-system-)

  - [Papers](#papers)

    - [Training](#training)

    - [Inference](#inference)

    - [Benchmark](#benchmark)

    - [Survey](#survey)

  - [Frameworks](#frameworks)

## Papers

### Training

| Year |  Publisher   | Title                                                        |         Framework         |

| :--: | :----------: | :----------------------------------------------------------- | :-----------------------: |

| 2023 |              | [Training Diffusion Models with Reinforcement Learning](https://arxiv.org/abs/2305.13301) |                           |

| 2023 |              | [Extracting Training Data from Diffusion Models](https://arxiv.org/abs/2301.13188) |                           |

| 2023 |     ICLR     | [DySR: Adaptive Super-Resolution via Algorithm and System Co-design](https://openreview.net/forum?id%253DPgtn4l6eKjv) |     [DeepSpeed](#ds)      |

| 2023 |              | [Scaling Vision-Language Models with Sparse Mixture of Experts](https://arxiv.org/abs/2303.07226) |     [DeepSpeed](#ds)      |

| 2023 |    IPDPS     | [MCR-DL: Mix-and-Match Communication Runtime for Deep Learning](https://arxiv.org/abs/2303.08374) |     [DeepSpeed](#ds)      |

| 2023 |     ICS      | [A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training](https://arxiv.org/abs/2303.06318) |     [DeepSpeed](#ds)      |

| 2023 |     OSDI     | [AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving](https://arxiv.org/abs/2302.11665) |       [Alpa](#alpa)       |

| 2023 |    MLSys     | [On Optimizing the Communication of Model Parallelism](https://arxiv.org/abs/2211.05322) |       [Alpa](#alpa)       |

| 2023 |              | [Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models](https://arxiv.org/abs/2302.02599) | [ColossalAI](#colossalai) |

| 2022 |     CVPR     | [Perception Prioritized Training of Diffusion Models](https://openaccess.thecvf.com/content/CVPR2022/papers/Choi_Perception_Prioritized_Training_of_Diffusion_Models_CVPR_2022_paper.pdf) |                           |

| 2022 |              | [Reducing Activation Recomputation in Large Transformer Models](https://arxiv.org/abs/2205.05198) | [Megatron-LM](#megatron)  |

| 2022 |     HiPC     | [1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed](https://ieeexplore.ieee.org/document/10106313) |     [DeepSpeed](#ds)      |

| 2022 |   NeurIPS    | [The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models](https://openreview.net/forum?id%253DJpZ5du_Kdh) |     [DeepSpeed](#ds)      |

| 2022 |              | [Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam](https://arxiv.org/abs/2202.06009) |     [DeepSpeed](#ds)      |

| 2022 |     ICML     | [DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale](https://proceedings.mlr.press/v162/rajbhandari22a.html) |     [DeepSpeed](#ds)      |

| 2022 |              | [Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](https://arxiv.org/abs/2201.11990) |     [DeepSpeed](#ds)      |

| 2022 |              | [Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers](https://arxiv.org/abs/2211.11586) |     [DeepSpeed](#ds)      |

| 2022 |              | [DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing](https://arxiv.org/abs/2212.03597) |     [DeepSpeed](#ds)      |

| 2022 |     OSDI     | [Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning](https://www.usenix.org/conference/osdi22/presentation/zheng-lianmin) |       [Alpa](#alpa)       |

| 2022 |     ICPP     | [Tesseract: Parallelize the Tensor Parallelism Efficiently](https://dl.acm.org/doi/abs/10.1145/3545008.3545087) | [ColossalAI](#colossalai) |

| 2022 |              | [A Frequency-aware Software Cache for Large Recommendation System Embeddings](https://arxiv.org/abs/2208.05321) | [ColossalAI](#colossalai) |

| 2022 |     TPDS     | [Parallel Training of Pre-Trained Models via Chunk-Based Dynamic Memory Management](https://www.computer.org/csdl/journal/td/2023/01/09940581/1I6O79tPnwc) | [ColossalAI](#colossalai) |

| 2021 |              | [Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM](https://arxiv.org/abs/2104.04473) | [Megatron-LM](#megatron)  |

| 2021 |              | [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) |                           |

| 2021 |      SC      | [ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning](https://dl.acm.org/doi/abs/10.1145/3458817.3476205) |     [DeepSpeed](#ds)      |

| 2021 |     ICML     | [1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed](http://proceedings.mlr.press/v139/tang21a.html) |     [DeepSpeed](#ds)      |

| 2021 |     ATC      | [ZeRO-Offload: Democratizing Billion-Scale Model Training.](https://www.usenix.org/conference/atc21/presentation/ren-jie) |     [DeepSpeed](#ds)      |

| 2021 |    PPoPP     | [DAPPLE: a pipelined data parallel approach for training large models](https://dl.acm.org/doi/10.1145/3437801.3441593) |                           |

| 2021 |     ICML     | [TeraPipe: Token-Level Pipeline Parallelism for Training Large](https://icml.cc/virtual/2021/poster/9181) |   [TeraPipe](#terapipe)   |

| 2021 |     ICML     | [Memory-Efficient Pipeline-Parallel DNN Training](https://icml.cc/virtual/2021/spotlight/10458) |  [PipeDream](#pipedream)  |

| 2021 |              | [An Efficient 2D Method for Training Super-Large Deep Learning Models](https://arxiv.org/abs/2104.05343) | [ColossalAI](#colossalai) |

| 2021 |              | [Maximizing Parallelism in Distributed Training for Huge Neural Networks](https://arxiv.org/abs/2105.14450) | [ColossalAI](#colossalai) |

| 2021 |              | [Sequence Parallelism: Long Sequence Training from System Perspective](https://arxiv.org/abs/2105.13120) | [ColossalAI](#colossalai) |

| 2021 |              | [Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training](https://arxiv.org/abs/2110.14883) | [ColossalAI](#colossalai) |

| 2020 | KDD Tutorial | [DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.](https://dl.acm.org/doi/10.1145/3394486.3406703) |     [DeepSpeed](#ds)      |

| 2020 |      SC      | [ZeRO: memory optimizations toward training trillion parameter models.](https://dl.acm.org/doi/10.5555/3433701.3433727) |     [DeepSpeed](#ds)      |

| 2020 |   NeuraIPS   | [Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping](https://proceedings.neurips.cc/paper/2020/hash/a1140a3d0df1c81e24ae954d935e8926-Abstract.html) |     [DeepSpeed](#ds)      |

| 2020 |              | [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) | [Megatron-LM](#megatron)  |

| 2020 |              | [torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models](https://arxiv.org/abs/2004.09910) |   [TorchGpipe](#gpipe)    |

| 2019 |   NeuraIPS   | [GPipe: efficient training of giant neural networks using pipeline parallelism](https://papers.nips.cc/paper_files/paper/2019/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html) |   [TorchGpipe](#gpipe)    |

| 2019 |     SOSP     | [PipeDream: Generalized pipeline parallelism for DNN training](https://dl.acm.org/doi/10.1145/3341301.3359646 ) |  [PipeDream](#pipedream)  |

### Compression

| Year | Publisher | Title                                                        | Framework        |

| :--- | --------- | ------------------------------------------------------------ | ---------------- |

| 2023 |           | [Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge](https://arxiv.org/abs/2312.05693) |                  |

| 2023 |           | [CBQ: Cross-Block Quantization for Large Language Models](https://arxiv.org/abs/2312.07950) |                  |

| 2023 |           | [Norm Tweaking: High-performance Low-bit Quantization of Large Language Models](https://arxiv.org/abs/2309.02784) |                  |

| 2023 |           | [Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM](https://arxiv.org/abs/2310.04836) |                  |

| 2023 |           | [Atom: Low-bit Quantization for Efficient and Accurate LLM Serving](https://arxiv.org/abs/2310.19102) |                  |

| 2023 |           | [RPTQ: Reorder-based Post-training Quantization for Large Language Models](https://arxiv.org/abs/2304.01089) |                  |

| 2023 |           | [SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression](https://arxiv.org/abs/2306.03078) |                  |

| 2023 |           | [LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning](https://arxiv.org/abs/2311.12023) |                  |

| 2023 |           | [QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2309.14717) |                  |

| 2023 |           | [LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models](https://arxiv.org/abs/2310.08659) |                  |

| 2023 |           | [AffineQuant: Affine Transformation Quantization for Large Language Models](https://openreview.net/forum?id=of2rhALq8l) |                  |

| 2023 |           | [LLM-QAT: Data-Free Quantization Aware Training for Large Language Models](https://arxiv.org/abs/2305.17888) |                  |

| 2023 |           | [QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models](https://arxiv.org/abs/2310.08041) |                  |

| 2023 |           | [LLM-Pruner: On the Structural Pruning of Large Language Models](https://arxiv.org/abs/2305.11627) |                  |

| 2023 |           | [OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models](https://arxiv.org/abs/2308.13137) |                  |

| 2023 |           | [SqueezeLLM: Dense-and-Sparse Quantization](https://arxiv.org/abs/2306.07629) |                  |

| 2023 |           | [A Simple and Effective Pruning Approach for Large Language Models](https://arxiv.org/abs/2306.11695) |                  |

| 2023 |           | [On Architectural Compression of Text-to-Image Diffusion Models](https://arxiv.org/pdf/2305.15798.pdf) |                  |

| 2023 | ICML      | [SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot](https://arxiv.org/abs/2301.00774) |                  |

| 2023 |           | [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978) |                  |

| 2023 |           | [OWQ: Lessons learned from activation outliers for weight quantization in large language models](https://arxiv.org/abs/2306.02272) |                  |

| 2023 | ICLR      | [GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers](https://arxiv.org/abs/2210.17323) |                  |

| 2023 | ISCA      | [OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization](https://dl.acm.org/doi/abs/10.1145/3579371.3589038) |                  |

| 2023 |           | [Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing](https://arxiv.org/abs/2306.12929) |                  |

| 2023 |           | [ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation](https://arxiv.org/abs/2303.08302) |                  |

| 2023 | ICML      | [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438) |                  |

| 2023 | ICML      | [Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases](https://arxiv.org/abs/2301.12017) | [DeepSpeed](#ds) |

| 2023 |           | [Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling](https://arxiv.org/pdf/2304.09145) |                  |

| 2023 |           | [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314) |                  |

| 2022 | NeuraIPS  | [ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers](https://openreview.net/forum?id%253Df-fVCElZ-G1) | [DeepSpeed](#ds) |

| 2022 | NeuraIPS  | [Extreme Compression for Pre-trained Transformers Made Simple and Efficient](https://openreview.net/forum?id%253DxNeAhc2CNAl) | [DeepSpeed](#ds) |

| 2022 | NeuraIPS  | [Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/file/6f6db140de9c9f111b12ef8a216320a9-Paper-Conference.pdf) |                  |

| 2022 | NeuraIPS  | [LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale](https://arxiv.org/abs/2208.07339) |                  |

| 2022 | NeuraIPS  | [ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers](https://proceedings.neurips.cc/paper_files/paper/2022/file/adf7fa39d65e2983d724ff7da57f00ac-Paper-Conference.pdf) |                  |

| 2021 | EMNLP     | [Understanding and Overcoming the Challenges of Efficient Transformer Quantization](https://arxiv.org/abs/2109.12948) |                  |

### Inference

| Year | Publisher | Title                                                                                                                                                          |       Framework       |

|:----:|:---------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------:|

| 2023 |           | [Fast Inference in Denoising Diffusion Models via MMD Finetuning](https://arxiv.org/pdf/2301.07969v1.pdf) |                       |

| 2023 |           | [EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models](https://arxiv.org/abs/2209.02341)                                             | [EnergonAI](#energon) |

| 2023 | | [H₂O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models](https://arxiv.org/abs/2306.14048) | |

| 2023 | | [FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU](https://arxiv.org/abs/2303.06865) | |

| 2022 |   ICML    | [DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale](https://proceedings.mlr.press/v162/rajbhandari22a.html) |   [DeepSpeed](#ds)    |

| 2022 |    SC     | [DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale](https://dl.acm.org/doi/abs/10.5555/3571885.3571946)           |   [DeepSpeed](#ds)    |

### Benchmark

| Year  | Publisher | Title  | Framework |

| :---: | :-------: | :----- | :-------: |

| Year  |    Pub    | Title  | Framework |

| Year  |    Pub    | Title1 | Framework |

### Survey

| Year  | Publisher | Title  | Framework |

| :---: | :-------: | :----- | :-------: |

| Year  |    Pub    | Title  | Framework |

| Year  |    Pub    | Title1 | Framework |

## Frameworks

| Year |                                                Name                                                 | Training | Inference | Serving | Comments                                                                                |

|:----:|:---------------------------------------------------------------------------------------------------:|:---------|:---------:|:-------:|:----------------------------------------------------------------------------------------|

| 2023 |            [EnergonAI](https://github.com/hpcaitech/EnergonAI)            | ✗        |     ✔     |    ✗    |                                                                                         |

| 2022 |                [Alpa](https://github.com/alpa-projects/alpa)                 | ✔        |     ✔     |    ✔    | Compilation based mixed parallelism                                                     |

| 2021 | [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed) | ✔        |     ✗     |    ✗    | Add  MoE model training, Curriculum Learning, 3D Parallelism from DeepSpeed to Megatron |

| 2021 |            [TeraPipe](https://github.com/zhuohan123/terapipe)            | ✔        |     ✗     |    ✗    |                                                                                         |

| 2021 |         [ColossalAI](https://github.com/hpcaitech/ColossalAI)          | ✔        |     ✔     |    ✔    |                                                                                         |

| 2021 |        [FasterTransformer](https://github.com/NVIDIA/FasterTransformer)        | ✗        |     ✔     |    ✗    |                                                                                         |

| 2020 |              [DeepSpeed](https://github.com/microsoft/DeepSpeed)               | ✔        |     ✔     |    ✗    | General Support of Transformers and MoE with 3d-parallelism                             |

| 2019 |           [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)           | ✔        |     ✗     |    ✗    |                                                                                         |

| 2019 |          [PipeDream](https://github.com/msr-fiddle/pipedream)           | ✔        |     ✗     |    ✗    |                                                                                         |

| 2019 |            [TorchGipe](https://github.com/kakaobrain/torchgpipe)            | ✔        |     ✗     |    ✗    | The `torchgipe` has been merged to PyTorch in 2020.                                     |

| 2019 |          [PipeDream](https://github.com/msr-fiddle/pipedream)           | ✔        |     ✗     |    ✗    |                                                                                         |