awesome-AI-system
paper and its code for AI System
https://github.com/lambda7xx/awesome-AI-system
Last synced: 13 days ago
JSON representation
-
Paper-Code
-
Parallellism Training
-
Researcher
-
Schedule and Resource Management
- An interference-aware scheduler for fine-grained GPU sharing Resources Eurosys'24
- ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning ASPLOS'23
- Multi-Resource Interleaving for Deep Learning Training SIGCOMM'22
- Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training ASPLOS'24
- KungFu: Making Training in Distributed Machine Learning Adaptive OSDI'20
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications OSDI'20
- An interference-aware scheduler for fine-grained GPU sharing Resources Eurosys'24
- ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning ASPLOS'23
- Multi-Resource Interleaving for Deep Learning Training SIGCOMM'22
- Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training ASPLOS'24
- Out-of-order backprop: an effective scheduling technique for deep learning Eurosys'22
- KungFu: Making Training in Distributed Machine Learning Adaptive OSDI'20
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications OSDI'20
- Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training ASPLOS'24
- Out-of-order backprop: an effective scheduling technique for deep learning Eurosys'22
-
Serving-Inference
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving OSDI'23
- Optimizing Dynamic Neural Networks with Brainstorm OSDI'23
- Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access Eurosys'23
- Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.ASPLOS'23
- MPCFormer: fast, performant, and private transformer inference with MPC ICLR'23
- High-throughput Generative Inference of Large Language Modelwith a Single GPU ICML'23
- VELTAIR: Towards High-Performance Multi-Tenant Deep Learning Serving via Adaptive Compilation and Scheduling ASPLOS'22
- DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs ATC'22
- Cocktail: A Multidimensional Optimization for Model Serving in Cloud NSDI'22
- Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing ATC'22
- INFaaS: Automated Model-less Inference Serving ATC'21
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI'20
- Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving ATC'19
- Nexus: a GPU cluster engine for accelerating DNN-based video analysis SOSP'19
- Clipper:A low-latency prediction-serving system NSDI'17
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling SOSP'23
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving OSDI'23
- Optimizing Dynamic Neural Networks with Brainstorm OSDI'23
- Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access Eurosys'23
- MPCFormer: fast, performant, and private transformer inference with MPC ICLR'23
- High-throughput Generative Inference of Large Language Modelwith a Single GPU ICML'23
- High-throughput Generative Inference of Large Language Modelwith a Single GPU ICML'23
- VELTAIR: Towards High-Performance Multi-Tenant Deep Learning Serving via Adaptive Compilation and Scheduling ASPLOS'22
- DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs ATC'22
- Cocktail: A Multidimensional Optimization for Model Serving in Cloud NSDI'22
- Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing ATC'22
- RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances SC'21
- INFaaS: Automated Model-less Inference Serving ATC'21
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Serving DNNs like Clockwork: Performance Predictability from the Bottom Up OSDI'20
- Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving ATC'19
- Nexus: a GPU cluster engine for accelerating DNN-based video analysis SOSP'19
- Clipper:A low-latency prediction-serving system NSDI'17
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction SC'21
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling SOSP'23
- Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs.ASPLOS'23
- MPCFormer: fast, performant, and private transformer inference with MPC ICLR'23
- VELTAIR: Towards High-Performance Multi-Tenant Deep Learning Serving via Adaptive Compilation and Scheduling ASPLOS'22
- RIBBON: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances SC'21
-
Training
- ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23
- Whale: Efficient Giant Model Training over Heterogeneous {GPUs}ATC'22
- GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server Eurosys'16
- ModelKeeper: Accelerating DNN Training via Automated Training Warmup NSDI'23
- STRONGHOLD: Fast and Affordable Billion-scale Deep Learning Model Training SC'22
- Whale: Efficient Giant Model Training over Heterogeneous {GPUs}ATC'22
- GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server Eurosys'16
- STRONGHOLD: Fast and Affordable Billion-scale Deep Learning Model Training SC'22
-
Categories
Sub Categories
Parallellism Training
59
Serving-Inference
44
Optimization
36
GNN
25
GPU Cluster Management
16
Schedule and Resource Management
15
MoE
14
Framework
9
Communication
8
Training
8
Misc
6
LLM Serving
5
LLM Serving Framework
5
Fancy LLM
5
Energy
4
LLM FineTune
2
Fine-Tune
2
LoRA
1
LLM Inference (System Side)
1
LLM Evaluation Platform
1
LLM Robustness and Debugging
1
Researcher
1
Attention
1