awesomemlsys
An ML Systems Onboarding list
https://github.com/gpu-mode/awesomemlsys
Last synced: 3 days ago
JSON representation
-
Attention Mechanism
-
Performance Optimizations
- Efficiently Scaling transformer inference
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
- Making Deep Learning go Brrr from First Principles
- Fast Inference from Transformers via Speculative Decoding
- Group Query Attention
- Orca: A Distributed Serving System for Transformer-Based Generative Models - read for the PagedAttention paper).
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Colfax Research Blog
- Sarathi LLM
- Epilogue Visitor Tree
-
Sparsity
-
Distributed
-
Long context length
-
Quantization
Programming Languages
Categories
Sub Categories