awesome-AI-system
paper and its code for AI System
https://github.com/lambda7xx/awesome-AI-system
Last synced: about 20 hours ago
JSON representation
-
Paper-Code
-
Attention
- ](https://chat.lmsys.org/) |
-
LLM FineTune
-
LLM Inference (System Side)
-  | - | Oct,2023 |
-
Misc
- AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and Implications HPCA'22
- Characterizing Variability in Large-Scale, Accelerator-Rich Systems SC'22
- Prediction of the Resource Consumption of Distributed Deep Learning Systems SIGMETRICS'22
- AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and Implications HPCA'22
- Characterizing Variability in Large-Scale, Accelerator-Rich Systems SC'22
- Prediction of the Resource Consumption of Distributed Deep Learning Systems SIGMETRICS'22
-
MoE
- SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Static and Dynamic Parallelization ATC'23
- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts MLSYS'23
- Tutel: Adaptive Mixture-of-Experts at Scale MLSYS'23
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ICLR'23
- SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Static and Dynamic Parallelization ATC'23
- Tutel: Adaptive Mixture-of-Experts at Scale MLSYS'23
- FastMoE: A Fast Mixture-of-Expert Training System PPOPP'22
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers ICLR'23
- awesome MoE
- MoE Paper
- Tutel: Adaptive Mixture-of-Experts at Scale MLSYS'23
- FastMoE: A Fast Mixture-of-Expert Training System PPOPP'22
- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts MLSYS'23
- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts MLSYS'23
-
Optimization
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters SOCC'22
- Accpar: Tensor partitioning for heterogeneous deep learning accelerators HPCA'20
- iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud TPDS'22
- CheckFreq: Frequent, Fine-Grained DNN Checkpointing FAST'22
- Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers VLDB'22
- PetS: A Unified Framework for Parameter-Efficient Transformers Serving ATC'22
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI'21
- Fluid: Resource-Aware Hyperparameter Tuning Engine MLSYS'21
- Baechi: Fast Device Placement on Machine Learning Graphs SOCC'20
- Dynamic Parameter Allocation in Parameter Servers VLDB'20
- Data Movement Is All You Need: A Case Study on Optimizing Transformers
- Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow ASPLOS'23
- Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs ASPLOS'23
- GLake: optimizing GPU memory management and IO transmission ASPLOS'24
- Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow ASPLOS'23
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters SOCC'22
- Accpar: Tensor partitioning for heterogeneous deep learning accelerators HPCA'20
- iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud TPDS'22
- CheckFreq: Frequent, Fine-Grained DNN Checkpointing FAST'22
- Efficient Quantized Sparse Matrix Operations on Tensor Cores SC'22
- Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers VLDB'22
- PetS: A Unified Framework for Parameter-Efficient Transformers Serving ATC'22
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections OSDI'21
- APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Core SC'21
- iGUARD: In-GPU Advanced Race Detection SOSP'21
- Fluid: Resource-Aware Hyperparameter Tuning Engine MLSYS'21
- Baechi: Fast Device Placement on Machine Learning Graphs SOCC'20
- Dynamic Parameter Allocation in Parameter Servers VLDB'20
- Data Movement Is All You Need: A Case Study on Optimizing Transformers
- APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Core SC'21
- MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters SOCC'22
- Accpar: Tensor partitioning for heterogeneous deep learning accelerators HPCA'20
- Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs ASPLOS'23
- iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud TPDS'22
- Efficient Quantized Sparse Matrix Operations on Tensor Cores SC'22
- iGUARD: In-GPU Advanced Race Detection SOSP'21
- iGUARD: In-GPU Advanced Race Detection SOSP'21
-
Parallellism Training
- Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization OSDI'22
- DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines Eurosys'24
- DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines Eurosys'24
- HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis Eurosys'24
- PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices MLSYS'23
- Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs NSDI'23
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression ASPLOS'23
- Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning OSDI'22
- AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness NeurIPS '22
- Piper: Multidimensional Planner for DNN Parallelization NeurIPS'21
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models ICML'21
- TeraPipe:Large-Scale Language Modeling with Pipeline Parallelism ICML'21
- PipeDream: Pipeline Parallelism for DNN Training SOSP'19
- SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
- NEAR ZERO BUBBLE PIPELINE PARALLELISM ICLR'24
- zero-bubble-pipeline-parallelism
- Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation Eurosys'24
- HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis Eurosys'24
- Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models SC'23
- Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models SC'23
- PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices MLSYS'23
- Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs NSDI'23
- Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression ASPLOS'23
- AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness NeurIPS '22
- NASPipe: High Performance and Reproducible Pipeline Parallel Supernet Training via Causal Synchronous Parallelism
- Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
- Chimera: efficiently training large-scale neural networks with bidirectional pipelines SC'21
- Piper: Multidimensional Planner for DNN Parallelization NeurIPS'21
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models ICML'21
- DAPPLE: An Efficient Pipelined Data Parallel Approach for Large Models Training PPOPP'21
- TeraPipe:Large-Scale Language Modeling with Pipeline Parallelism ICML'21
- PipeDream: Pipeline Parallelism for DNN Training SOSP'19
- SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
- awesome distributed deep learning
- awsome parallelism
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
- MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism HPCA'23
- Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
- Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
-
Categories
Sub Categories
Parallellism Training
61
Serving-Inference
46
Optimization
37
GNN
26
GPU Cluster Management
17
Schedule and Resource Management
16
MoE
14
Framework
11
Communication
10
Training
8
Misc
6
LLM Serving
5
LLM Serving Framework
5
Fancy LLM
5
Energy
4
Fine-Tune
3
LLM FineTune
2
LoRA
1
LLM Inference (System Side)
1
LLM Evaluation Platform
1
LLM Robustness and Debugging
1
Researcher
1
Attention
1