awesomemlsys
An ML Systems Onboarding list
https://github.com/gpu-mode/awesomemlsys
Last synced: 3 days ago
JSON representation
-
Attention Mechanism
-
Distributed
- Singularity
- Local SGD
- OpenDiloco
- torchtitan
- pipedream
- Reducing Activation Recomputation in Large Transformer models
- Breaking the computation and communication abstraction barrier
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Megatron-LM
- jit checkpointing
- DistServe
- HybridFlow
- Ray
-
Linear attention
- Flash linear attention - of-the-art linear attention models (and their papers)
-
Long context length
-
Performance Optimizations
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
- Efficiently Scaling transformer inference
- Making Deep Learning go Brrr from First Principles
- Fast Inference from Transformers via Speculative Decoding
- Group Query Attention
- Orca: A Distributed Serving System for Transformer-Based Generative Models - read for the PagedAttention paper).
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Colfax Research Blog
- Sarathi LLM
- Epilogue Visitor Tree
-
Quantization
-
Sparsity
-
Speculative decoding
Programming Languages
Categories
Sub Categories